1
|
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex. Proc Natl Acad Sci U S A 2025; 122:e2412243122. [PMID: 40294254 PMCID: PMC12067213 DOI: 10.1073/pnas.2412243122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 03/21/2025] [Indexed: 04/30/2025] Open
Abstract
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within the auditory cortex, critical unanswered questions remain regarding the organization and dynamics of sound categorization. We performed intracerebral recordings during epilepsy surgery evaluation as 20 patient-participants listened to natural sounds. We then built encoding models to predict neural responses using sound representations extracted from different layers within a deep neural network (DNN) pretrained to categorize sounds from acoustics. This approach yielded accurate models of neural responses throughout the auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. We then characterized the time (relative to sound onset) when feature representations emerged; this measure of temporal dynamics increased across the auditory hierarchy. Finally, we found separable effects of region and temporal dynamics on representational complexity: sites that took longer to begin encoding stimulus features had higher representational complexity independent of region, and downstream regions encoded more complex features independent of temporal dynamics. These findings suggest that hierarchies of timescales and complexity represent a functional organizational principle of the auditory stream underlying our ability to rapidly categorize sounds.
Collapse
Affiliation(s)
- Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, PA15213
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, PA15213
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, PA15213
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, TX78712
| | | | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, PA15213
- Department of Bioengineering, University of Pittsburgh, PA15261
| |
Collapse
|
2
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. J Neurosci 2025; 45:e1143242025. [PMID: 39809543 PMCID: PMC11905352 DOI: 10.1523/jneurosci.1143-24.2025] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 12/30/2024] [Accepted: 01/03/2025] [Indexed: 01/16/2025] Open
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are enhanced by linguistic content. Here, we recorded magnetoencephalography responses while subjects of both sexes listened to four types of continuous speechlike passages: speech envelope-modulated noise, English-like nonwords, scrambled words, and a narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in the cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher-order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexicosemantic features. Finally, our results identify potential neural markers, linguistic-level late responses, derived from TRF components modulated by linguistic content, suggesting that these markers are indicative of speech comprehension rather than mere speech perception.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, Ontario L8S 4L8, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Ontario M1C 1A4, Canada
| | - Philip Resnik
- Departments of Linguistics and Institute for Advanced Computer Studies, College Park, Maryland, 20742
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742
- Biology, University of Maryland, College Park, Maryland, 20742
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| |
Collapse
|
3
|
Maruyama H, Motoyoshi I. Two-stage spectral space and the perceptual properties of sound textures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:2067-2076. [PMID: 40130951 DOI: 10.1121/10.0036219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 03/03/2025] [Indexed: 03/26/2025]
Abstract
Textural sounds can be perceived in the natural environment such as wind, waterflows, and footsteps. Recent studies have shown that the perception of auditory textures can be described and synthesized by the multiple classes of time-averaged statistics or the linear spectra and energy spectra of input sounds. The findings lead to a possibility that the explicit perceptual property of a textural sound, such as heaviness and complexity, could be predictable from the two-stage spectra. In the present study, numerous rating data were collected for 17 different perceptual properties with 325 real-world sounds, and the relationship between the rating and the two-stage spectral characteristics was investigated. The analysis showed that the ratings for each property were strongly and systematically correlated with specific frequency bands in the two-stage spectral space. The subsequent experiment demonstrated further that manipulation of power at critical frequency bands significantly alters the perceived property of natural sounds in the predicted direction. The results suggest that the perceptual impression of sound texture is strongly dependent on the power distribution of first- and second-order acoustic filters in the early auditory system.
Collapse
Affiliation(s)
- Hironori Maruyama
- Graduate School of Arts and Sciences, The University of Tokyo, Meguro-ku, Tokyo, 153-8902, Japan
- Japan Society for the Promotion of Science (JSPS), Chiyoda-ku, Tokyo, 102-0083, Japan
| | - Isamu Motoyoshi
- Graduate School of Arts and Sciences, The University of Tokyo, Meguro-ku, Tokyo, 153-8902, Japan
| |
Collapse
|
4
|
Russell K, Kearns CA, Walker MB, Knoeckel CS, Ribera AB, Doll CA, Appel B. Bdnf-Ntrk2 Signaling Promotes but is not Essential for Spinal Cord Myelination in Larval Zebrafish. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.19.639062. [PMID: 40027741 PMCID: PMC11870533 DOI: 10.1101/2025.02.19.639062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Myelin, a specialized membrane produced by oligodendroglial cells in the central nervous system, wraps axons to enhance conduction velocity and maintain axon health. Not all axons are myelinated, and not all myelinated axons are uniformly wrapped along their lengths. Several lines of evidence indicate that neuronal activity can influence myelination, however, the cellular and molecular mechanisms that mediate communication between axons and oligodendrocytes remain poorly understood. Prior research showed that the neurotrophic growth factor Bdnf and its receptor Ntrk2 promote myelination in rodents, raising the possibility that Bdnf and Ntrk2 convey myelin-promoting signals from neurons to oligodendrocytes. We explored this possibility using a combination of gene expression analyses, gene function tests, and myelin sheath formation assays in zebrafish larvae. Altogether, our data indicate that, although not essential for myelination, Bdnf-Ntrk2 signaling contributes to the timely formation of myelin in the developing zebrafish spinal cord.
Collapse
Affiliation(s)
- Kristen Russell
- Department of Pediatrics, Section of Developmental Biology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| | - Christina A. Kearns
- Department of Pediatrics, Section of Developmental Biology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| | - Macie B. Walker
- Department of Physiology and Biophysics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| | - Christopher S. Knoeckel
- Department of Physiology and Biophysics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| | - Angeles B. Ribera
- Department of Physiology and Biophysics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| | - Caleb A. Doll
- Department of Pediatrics, Section of Developmental Biology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| | - Bruce Appel
- Department of Pediatrics, Section of Developmental Biology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA, 80045
| |
Collapse
|
5
|
Karunathilake ID, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are enhanced by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and a narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers, linguistic level late responses, derived from TRF components modulated by linguistic content, suggesting that these markers are indicative of speech comprehension rather than mere speech perception.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics, and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20742
| | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, 20742
| |
Collapse
|
6
|
Cosper SH, Männel C, Mueller JL. Auditory associative word learning in adults: The effects of musical experience and stimulus ordering. Brain Cogn 2024; 180:106207. [PMID: 39053199 DOI: 10.1016/j.bandc.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 06/18/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024]
Abstract
Evidence for sequential associative word learning in the auditory domain has been identified in infants, while adults have shown difficulties. To better understand which factors may facilitate adult auditory associative word learning, we assessed the role of auditory expertise as a learner-related property and stimulus order as a stimulus-related manipulation in the association of auditory objects and novel labels. We tested in the first experiment auditorily-trained musicians versus athletes (high-level control group) and in the second experiment stimulus ordering, contrasting object-label versus label-object presentation. Learning was evaluated from Event-Related Potentials (ERPs) during training and subsequent testing phases using a cluster-based permutation approach, as well as accuracy-judgement responses during test. Results revealed for musicians a late positive component in the ERP during testing, but neither an N400 (400-800 ms) nor behavioral effects were found at test, while athletes did not show any effect of learning. Moreover, the object-label-ordering group only exhibited emerging association effects during training, while the label-object-ordering group showed a trend-level late ERP effect (800-1200 ms) during test as well as above chance accuracy-judgement scores. Thus, our results suggest the learner-related property of auditory expertise and stimulus-related manipulation of stimulus ordering modulate auditory associative word learning in adults.
Collapse
Affiliation(s)
- Samuel H Cosper
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.
| | - Claudia Männel
- Department of Audiology and Phoniatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Jutta L Mueller
- Department of Linguistics, University of Vienna, Vienna, Austria
| |
Collapse
|
7
|
Regev TI, Casto C, Hosseini EA, Adamek M, Ritaccio AL, Willie JT, Brunner P, Fedorenko E. Neural populations in the language network differ in the size of their temporal receptive windows. Nat Hum Behav 2024; 8:1924-1942. [PMID: 39187713 DOI: 10.1038/s41562-024-01944-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 07/03/2024] [Indexed: 08/28/2024]
Abstract
Despite long knowing what brain areas support language comprehension, our knowledge of the neural computations that these frontal and temporal regions implement remains limited. One important unresolved question concerns functional differences among the neural populations that comprise the language network. Here we leveraged the high spatiotemporal resolution of human intracranial recordings (n = 22) to examine responses to sentences and linguistically degraded conditions. We discovered three response profiles that differ in their temporal dynamics. These profiles appear to reflect different temporal receptive windows, with average windows of about 1, 4 and 6 words, respectively. Neural populations exhibiting these profiles are interleaved across the language network, which suggests that all language regions have direct access to distinct, multiscale representations of linguistic input-a property that may be critical for the efficiency and robustness of language processing.
Collapse
Affiliation(s)
- Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Colton Casto
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
| | - Eghbal A Hosseini
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Markus Adamek
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | | | - Jon T Willie
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | - Peter Brunner
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
| |
Collapse
|
8
|
Norman-Haignere SV, Keshishian MK, Devinsky O, Doyle W, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614358. [PMID: 39386565 PMCID: PMC11463558 DOI: 10.1101/2024.09.23.614358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Sound structures such as phonemes and words have highly variable durations. Thus, there is a fundamental difference between integrating across absolute time (e.g., 100 ms) vs. sound structure (e.g., phonemes). Auditory and cognitive models have traditionally cast neural integration in terms of time and structure, respectively, but the extent to which cortical computations reflect time or structure remains unknown. To answer this question, we rescaled the duration of all speech structures using time stretching/compression and measured integration windows in the human auditory cortex using a new experimental/computational method applied to spatiotemporally precise intracranial recordings. We observed significantly longer integration windows for stretched speech, but this lengthening was very small (~5%) relative to the change in structure durations, even in non-primary regions strongly implicated in speech-specific processing. These findings demonstrate that time-yoked computations dominate throughout the human auditory cortex, placing important constraints on neurocomputational models of structure processing.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- University of Rochester Medical Center, Department of Biostatistics and Computational Biology
- University of Rochester Medical Center, Department of Neuroscience
- University of Rochester, Department of Brain and Cognitive Sciences
- University of Rochester, Department of Biomedical Engineering
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
| | - Menoua K. Keshishian
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Neurosurgery, NYU Langone Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| |
Collapse
|
9
|
Tuckute G, Kanwisher N, Fedorenko E. Language in Brains, Minds, and Machines. Annu Rev Neurosci 2024; 47:277-301. [PMID: 38669478 DOI: 10.1146/annurev-neuro-120623-101142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey the new purchase LMs are providing on the question of how language is implemented in the brain. We discuss why, a priori, LMs might be expected to share similarities with the human language system. We then summarize evidence that LMs represent linguistic information similarly enough to humans to enable relatively accurate brain encoding and decoding during language processing. Finally, we examine which LM properties-their architecture, task performance, or training-are critical for capturing human neural responses to language and review studies using LMs as in silico model organisms for testing hypotheses about language. These ongoing investigations bring us closer to understanding the representations and processes that underlie our ability to comprehend sentences and express thoughts in language.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
10
|
Teng X, Larrouy-Maestri P, Poeppel D. Segmenting and Predicting Musical Phrase Structure Exploits Neural Gain Modulation and Phase Precession. J Neurosci 2024; 44:e1331232024. [PMID: 38926087 PMCID: PMC11270514 DOI: 10.1523/jneurosci.1331-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 05/29/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
Music, like spoken language, is often characterized by hierarchically organized structure. Previous experiments have shown neural tracking of notes and beats, but little work touches on the more abstract question: how does the brain establish high-level musical structures in real time? We presented Bach chorales to participants (20 females and 9 males) undergoing electroencephalogram (EEG) recording to investigate how the brain tracks musical phrases. We removed the main temporal cues to phrasal structures, so that listeners could only rely on harmonic information to parse a continuous musical stream. Phrasal structures were disrupted by locally or globally reversing the harmonic progression, so that our observations on the original music could be controlled and compared. We first replicated the findings on neural tracking of musical notes and beats, substantiating the positive correlation between musical training and neural tracking. Critically, we discovered a neural signature in the frequency range ∼0.1 Hz (modulations of EEG power) that reliably tracks musical phrasal structure. Next, we developed an approach to quantify the phrasal phase precession of the EEG power, revealing that phrase tracking is indeed an operation of active segmentation involving predictive processes. We demonstrate that the brain establishes complex musical structures online over long timescales (>5 s) and actively segments continuous music streams in a manner comparable to language processing. These two neural signatures, phrase tracking and phrasal phase precession, provide new conceptual and technical tools to study the processes underpinning high-level structure building using noninvasive recording techniques.
Collapse
Affiliation(s)
- Xiangbin Teng
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Pauline Larrouy-Maestri
- Music Department, Max-Planck-Institute for Empirical Aesthetics, Frankfurt 60322, Germany
- Center for Language, Music, and Emotion (CLaME), New York, New York 10003
| | - David Poeppel
- Center for Language, Music, and Emotion (CLaME), New York, New York 10003
- Department of Psychology, New York University, New York, New York 10003
- Ernst Struengmann Institute for Neuroscience, Frankfurt 60528, Germany
- Music and Audio Research Laboratory (MARL), New York, New York 11201
| |
Collapse
|
11
|
Ozernov-Palchik O, O’Brien AM, Jiachen Lee E, Richardson H, Romeo R, Lipkin B, Small H, Capella J, Nieto-Castañón A, Saxe R, Gabrieli JDE, Fedorenko E. Precision fMRI reveals that the language network exhibits adult-like left-hemispheric lateralization by 4 years of age. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594172. [PMID: 38798360 PMCID: PMC11118489 DOI: 10.1101/2024.05.15.594172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Left hemisphere damage in adulthood often leads to linguistic deficits, but many cases of early damage leave linguistic processing preserved, and a functional language system can develop in the right hemisphere. To explain this early apparent equipotentiality of the two hemispheres for language, some have proposed that the language system is bilateral during early development and only becomes left-lateralized with age. We examined language lateralization using functional magnetic resonance imaging with two large pediatric cohorts (total n=273 children ages 4-16; n=107 adults). Strong, adult-level left-hemispheric lateralization (in activation volume and response magnitude) was evident by age 4. Thus, although the right hemisphere can take over language function in some cases of early brain damage, and although some features of the language system do show protracted development (magnitude of language response and strength of inter-regional correlations in the language network), the left-hemisphere bias for language is robustly present by 4 years of age. These results call for alternative accounts of early equipotentiality of the two hemispheres for language.
Collapse
Affiliation(s)
- Ola Ozernov-Palchik
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Amanda M. O’Brien
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, United States
| | - Elizabeth Jiachen Lee
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - Hilary Richardson
- School of Philosophy, Psychology, and Language Sciences, University of Edinburgh, Edinburgh, EH8 9JZ, United Kingdom
| | - Rachel Romeo
- Department of Human Development and Quantitative Methodology, University of Maryland, College Park, MD 20742, United States
| | - Benjamin Lipkin
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - Hannah Small
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Jimmy Capella
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | | | - Rebecca Saxe
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - John D. E. Gabrieli
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - Evelina Fedorenko
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| |
Collapse
|
12
|
Shain C, Kean H, Casto C, Lipkin B, Affourtit J, Siegelman M, Mollica F, Fedorenko E. Distributed Sensitivity to Syntax and Semantics throughout the Language Network. J Cogn Neurosci 2024; 36:1427-1471. [PMID: 38683732 DOI: 10.1162/jocn_a_02164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Human language is expressive because it is compositional: The meaning of a sentence (semantics) can be inferred from its structure (syntax). It is commonly believed that language syntax and semantics are processed by distinct brain regions. Here, we revisit this claim using precision fMRI methods to capture separation or overlap of function in the brains of individual participants. Contrary to prior claims, we find distributed sensitivity to both syntax and semantics throughout a broad frontotemporal brain network. Our results join a growing body of evidence for an integrated network for language in the human brain within which internal specialization is primarily a matter of degree rather than kind, in contrast with influential proposals that advocate distinct specialization of different brain areas for different types of linguistic functions.
Collapse
Affiliation(s)
| | - Hope Kean
- Massachusetts Institute of Technology
| | | | | | | | | | | | | |
Collapse
|
13
|
Fedorenko E, Piantadosi ST, Gibson EAF. Language is primarily a tool for communication rather than thought. Nature 2024; 630:575-586. [PMID: 38898296 DOI: 10.1038/s41586-024-07522-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/03/2024] [Indexed: 06/21/2024]
Abstract
Language is a defining characteristic of our species, but the function, or functions, that it serves has been debated for centuries. Here we bring recent evidence from neuroscience and allied disciplines to argue that in modern humans, language is a tool for communication, contrary to a prominent view that we use language for thinking. We begin by introducing the brain network that supports linguistic ability in humans. We then review evidence for a double dissociation between language and thought, and discuss several properties of language that suggest that it is optimized for communication. We conclude that although the emergence of language has unquestionably transformed human culture, language does not appear to be a prerequisite for complex thought, including symbolic thought. Instead, language is a powerful tool for the transmission of cultural knowledge; it plausibly co-evolved with our thinking and reasoning capacities, and only reflects, rather than gives rise to, the signature sophistication of human cognition.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing in Bioscience and Technology Program at Harvard University, Boston, MA, USA.
| | | | | |
Collapse
|
14
|
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595822. [PMID: 38826304 PMCID: PMC11142240 DOI: 10.1101/2024.05.24.595822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within auditory cortex, there are critical unanswered questions regarding the organization and dynamics of sound categorization. Here, we performed intracerebral recordings in the context of epilepsy surgery as 20 patient-participants listened to natural sounds. We built encoding models to predict neural responses using features of these sounds extracted from different layers within a sound-categorization deep neural network (DNN). This approach yielded highly accurate models of neural responses throughout auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers of the DNN associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity also existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt, and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. When we estimated the time window over which each recording site integrates information, we found shorter integration windows in core relative to lateral belt and parabelt. Lastly, we found a relationship between the length of the integration window and the complexity of information processing within core (but not lateral belt or parabelt). These findings suggest hierarchies of timescales and processing complexity, and their interrelationship, represent a functional organizational principle of the auditory stream that underlies our perception of complex, abstract auditory information.
Collapse
Affiliation(s)
- Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Avniel Singh Ghuman
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
15
|
Fedorenko E, Ivanova AA, Regev TI. The language network as a natural kind within the broader landscape of the human brain. Nat Rev Neurosci 2024; 25:289-312. [PMID: 38609551 DOI: 10.1038/s41583-024-00802-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 04/14/2024]
Abstract
Language behaviour is complex, but neuroscientific evidence disentangles it into distinct components supported by dedicated brain areas or networks. In this Review, we describe the 'core' language network, which includes left-hemisphere frontal and temporal areas, and show that it is strongly interconnected, independent of input and output modalities, causally important for language and language-selective. We discuss evidence that this language network plausibly stores language knowledge and supports core linguistic computations related to accessing words and constructions from memory and combining them to interpret (decode) or generate (encode) linguistic messages. We emphasize that the language network works closely with, but is distinct from, both lower-level - perceptual and motor - mechanisms and higher-level systems of knowledge and reasoning. The perceptual and motor mechanisms process linguistic signals, but, in contrast to the language network, are sensitive only to these signals' surface properties, not their meanings; the systems of knowledge and reasoning (such as the system that supports social reasoning) are sometimes engaged during language use but are not language-selective. This Review lays a foundation both for in-depth investigations of these different components of the language processing pipeline and for probing inter-component interactions.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- The Program in Speech and Hearing in Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Anna A Ivanova
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
16
|
Chang A, Teng X, Assaneo MF, Poeppel D. The human auditory system uses amplitude modulation to distinguish music from speech. PLoS Biol 2024; 22:e3002631. [PMID: 38805517 PMCID: PMC11132470 DOI: 10.1371/journal.pbio.3002631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/17/2024] [Indexed: 05/30/2024] Open
Abstract
Music and speech are complex and distinct auditory signals that are both foundational to the human experience. The mechanisms underpinning each domain are widely investigated. However, what perceptual mechanism transforms a sound into music or speech and how basic acoustic information is required to distinguish between them remain open questions. Here, we hypothesized that a sound's amplitude modulation (AM), an essential temporal acoustic feature driving the auditory system across processing levels, is critical for distinguishing music and speech. Specifically, in contrast to paradigms using naturalistic acoustic signals (that can be challenging to interpret), we used a noise-probing approach to untangle the auditory mechanism: If AM rate and regularity are critical for perceptually distinguishing music and speech, judging artificially noise-synthesized ambiguous audio signals should align with their AM parameters. Across 4 experiments (N = 335), signals with a higher peak AM frequency tend to be judged as speech, lower as music. Interestingly, this principle is consistently used by all listeners for speech judgments, but only by musically sophisticated listeners for music. In addition, signals with more regular AM are judged as music over speech, and this feature is more critical for music judgment, regardless of musical sophistication. The data suggest that the auditory system can rely on a low-level acoustic property as basic as AM to distinguish music from speech, a simple principle that provokes both neurophysiological and evolutionary experiments and speculations.
Collapse
Affiliation(s)
- Andrew Chang
- Department of Psychology, New York University, New York, New York, United States of America
| | - Xiangbin Teng
- Department of Psychology, Chinese University of Hong Kong, Hong Kong SAR, China
| | - M. Florencia Assaneo
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - David Poeppel
- Department of Psychology, New York University, New York, New York, United States of America
- Ernst Struengmann Institute for Neuroscience, Frankfurt am Main, Germany
- Center for Language, Music, and Emotion (CLaME), New York University, New York, New York, United States of America
- Music and Audio Research Lab (MARL), New York University, New York, New York, United States of America
| |
Collapse
|
17
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
18
|
Jain S, Vo VA, Wehbe L, Huth AG. Computational Language Modeling and the Promise of In Silico Experimentation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:80-106. [PMID: 38645624 PMCID: PMC11025654 DOI: 10.1162/nol_a_00101] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 01/18/2023] [Indexed: 04/23/2024]
Abstract
Language neuroscience currently relies on two major experimental paradigms: controlled experiments using carefully hand-designed stimuli, and natural stimulus experiments. These approaches have complementary advantages which allow them to address distinct aspects of the neurobiology of language, but each approach also comes with drawbacks. Here we discuss a third paradigm-in silico experimentation using deep learning-based encoding models-that has been enabled by recent advances in cognitive computational neuroscience. This paradigm promises to combine the interpretability of controlled experiments with the generalizability and broad scope of natural stimulus experiments. We show four examples of simulating language neuroscience experiments in silico and then discuss both the advantages and caveats of this approach.
Collapse
Affiliation(s)
- Shailee Jain
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| | - Vy A. Vo
- Brain-Inspired Computing Lab, Intel Labs, Hillsboro, OR, USA
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alexander G. Huth
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
- Department of Neuroscience, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
19
|
McMullin MA, Kumar R, Higgins NC, Gygi B, Elhilali M, Snyder JS. Preliminary Evidence for Global Properties in Human Listeners During Natural Auditory Scene Perception. Open Mind (Camb) 2024; 8:333-365. [PMID: 38571530 PMCID: PMC10990578 DOI: 10.1162/opmi_a_00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/10/2024] [Indexed: 04/05/2024] Open
Abstract
Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field's ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.
Collapse
Affiliation(s)
| | - Rohit Kumar
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan C. Higgins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, FL, USA
| | - Brian Gygi
- East Bay Institute for Research and Education, Martinez, CA, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV, USA
| |
Collapse
|
20
|
Crespo-Bojorque P, Cauvet E, Pallier C, Toro JM. Recognizing structure in novel tunes: differences between human and rats. Anim Cogn 2024; 27:17. [PMID: 38429431 PMCID: PMC10907461 DOI: 10.1007/s10071-024-01848-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/27/2023] [Accepted: 11/08/2023] [Indexed: 03/03/2024]
Abstract
A central feature in music is the hierarchical organization of its components. Musical pieces are not a simple concatenation of chords, but are characterized by rhythmic and harmonic structures. Here, we explore if sensitivity to music structure might emerge in the absence of any experience with musical stimuli. For this, we tested if rats detect the difference between structured and unstructured musical excerpts and compared their performance with that of humans. Structured melodies were excerpts of Mozart's sonatas. Unstructured melodies were created by the recombination of fragments of different sonatas. We trained listeners (both human participants and Long-Evans rats) with a set of structured and unstructured excerpts, and tested them with completely novel excerpts they had not heard before. After hundreds of training trials, rats were able to tell apart novel structured from unstructured melodies. Human listeners required only a few trials to reach better performance than rats. Interestingly, such performance was increased in humans when tonality changes were included, while it decreased to chance in rats. Our results suggest that, with enough training, rats might learn to discriminate acoustic differences differentiating hierarchical music structures from unstructured excerpts. More importantly, the results point toward species-specific adaptations on how tonality is processed.
Collapse
Affiliation(s)
| | - Elodie Cauvet
- Cognitive Neuroimaging Unit, INSERM, CEA, CNRS, Université Paris-Saclay, NeuroSpin Center, Gif-Sur-Yvette, France
- DIS Study Abroad in Scandinavia, Stockholm, Sweden
| | - Christophe Pallier
- Cognitive Neuroimaging Unit, INSERM, CEA, CNRS, Université Paris-Saclay, NeuroSpin Center, Gif-Sur-Yvette, France
| | - Juan M Toro
- Universitat Pompeu Fabra, C. Ramon Trias Fargas, 25-27, CP. 08005, Barcelona, Spain.
- Institució Catalana de Recerca I Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
21
|
Regev TI, Kim HS, Chen X, Affourtit J, Schipper AE, Bergen L, Mahowald K, Fedorenko E. High-level language brain regions process sublexical regularities. Cereb Cortex 2024; 34:bhae077. [PMID: 38494886 PMCID: PMC11486690 DOI: 10.1093/cercor/bhae077] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 02/05/2024] [Accepted: 02/07/2024] [Indexed: 03/19/2024] Open
Abstract
A network of left frontal and temporal brain regions supports language processing. This "core" language network stores our knowledge of words and constructions as well as constraints on how those combine to form sentences. However, our linguistic knowledge additionally includes information about phonemes and how they combine to form phonemic clusters, syllables, and words. Are phoneme combinatorics also represented in these language regions? Across five functional magnetic resonance imaging experiments, we investigated the sensitivity of high-level language processing brain regions to sublexical linguistic regularities by examining responses to diverse nonwords-sequences of phonemes that do not constitute real words (e.g. punes, silory, flope). We establish robust responses in the language network to visually (experiment 1a, n = 605) and auditorily (experiments 1b, n = 12, and 1c, n = 13) presented nonwords. In experiment 2 (n = 16), we find stronger responses to nonwords that are more well-formed, i.e. obey the phoneme-combinatorial constraints of English. Finally, in experiment 3 (n = 14), we provide suggestive evidence that the responses in experiments 1 and 2 are not due to the activation of real words that share some phonology with the nonwords. The results suggest that sublexical regularities are stored and processed within the same fronto-temporal network that supports lexical and syntactic processes.
Collapse
Affiliation(s)
- Tamar I Regev
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Hee So Kim
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Xuanyi Chen
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Cognitive Sciences, Rice University, Houston, TX 77005, United States
| | - Josef Affourtit
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Abigail E Schipper
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - Leon Bergen
- Department of Linguistics, University of California San Diego, San Diego CA 92093, United States
| | - Kyle Mahowald
- Department of Linguistics, University of Texas at Austin, Austin, TX 78712, United States
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- The Harvard Program in Speech and Hearing Bioscience and Technology, Boston, MA 02115, United States
| |
Collapse
|
22
|
Malik-Moraleda S, Jouravlev O, Taliaferro M, Mineroff Z, Cucu T, Mahowald K, Blank IA, Fedorenko E. Functional characterization of the language network of polyglots and hyperpolyglots with precision fMRI. Cereb Cortex 2024; 34:bhae049. [PMID: 38466812 PMCID: PMC10928488 DOI: 10.1093/cercor/bhae049] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 01/24/2024] [Accepted: 01/25/2024] [Indexed: 03/13/2024] Open
Abstract
How do polyglots-individuals who speak five or more languages-process their languages, and what can this population tell us about the language system? Using fMRI, we identified the language network in each of 34 polyglots (including 16 hyperpolyglots with knowledge of 10+ languages) and examined its response to the native language, non-native languages of varying proficiency, and unfamiliar languages. All language conditions engaged all areas of the language network relative to a control condition. Languages that participants rated as higher proficiency elicited stronger responses, except for the native language, which elicited a similar or lower response than a non-native language of similar proficiency. Furthermore, unfamiliar languages that were typologically related to the participants' high-to-moderate-proficiency languages elicited a stronger response than unfamiliar unrelated languages. The results suggest that the language network's response magnitude scales with the degree of engagement of linguistic computations (e.g. related to lexical access and syntactic-structure building). We also replicated a prior finding of weaker responses to native language in polyglots than non-polyglot bilinguals. These results contribute to our understanding of how multiple languages coexist within a single brain and provide new evidence that the language network responds more strongly to stimuli that more fully engage linguistic computations.
Collapse
Affiliation(s)
- Saima Malik-Moraleda
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA 02114, United States
| | - Olessia Jouravlev
- Department of Cognitive Science, Carleton University, Ottawa K1S 5B6, Canada
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Zachary Mineroff
- Eberly Center, Carnegie Mellon University, Pittsburgh, PA 15289, United States
| | - Theodore Cucu
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15289, United States
| | - Kyle Mahowald
- Department of Linguistics, The University of Texas at Austin, Austin, TX 78712, United States
| | - Idan A Blank
- Department of Psychology, University of California Los Angeles, Los Angeles, CA 90095, United States
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA 02114, United States
| |
Collapse
|
23
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
24
|
Malik-Moraleda S, Jouravlev O, Taliaferro M, Mineroff Z, Cucu T, Mahowald K, Blank IA, Fedorenko E. Functional characterization of the language network of polyglots and hyperpolyglots with precision fMRI. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.01.19.524657. [PMID: 36711949 PMCID: PMC9882290 DOI: 10.1101/2023.01.19.524657] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
How do polyglots-individuals who speak five or more languages-process their languages, and what can this population tell us about the language system? Using fMRI, we identified the language network in each of 34 polyglots (including 16 hyperpolyglots with knowledge of 10+ languages) and examined its response to the native language, non-native languages of varying proficiency, and unfamiliar languages. All language conditions engaged all areas of the language network relative to a control condition. Languages that participants rated as higher-proficiency elicited stronger responses, except for the native language, which elicited a similar or lower response than a non-native language of similar proficiency. Furthermore, unfamiliar languages that were typologically related to the participants' high-to-moderate-proficiency languages elicited a stronger response than unfamiliar unrelated languages. The results suggest that the language network's response magnitude scales with the degree of engagement of linguistic computations (e.g., related to lexical access and syntactic-structure building). We also replicated a prior finding of weaker responses to native language in polyglots than non-polyglot bilinguals. These results contribute to our understanding of how multiple languages co-exist within a single brain and provide new evidence that the language network responds more strongly to stimuli that more fully engage linguistic computations.
Collapse
Affiliation(s)
- Saima Malik-Moraleda
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA 02114
| | - Olessia Jouravlev
- Department of Cognitive Science, Carleton University, Ottawa, Canada, K1S 5B6
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | - Theodore Cucu
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15289
| | - Kyle Mahowald
- Department of Linguistics, The University of Texas at Austin, Austin, TX 78712
| | - Idan A. Blank
- Department of Psychology, University of California Los Angeles, CA 90095
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA 02114
| |
Collapse
|
25
|
Barchet AV, Henry MJ, Pelofi C, Rimmele JM. Auditory-motor synchronization and perception suggest partially distinct time scales in speech and music. COMMUNICATIONS PSYCHOLOGY 2024; 2:2. [PMID: 39242963 PMCID: PMC11332030 DOI: 10.1038/s44271-023-00053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 12/19/2023] [Indexed: 09/09/2024]
Abstract
Speech and music might involve specific cognitive rhythmic timing mechanisms related to differences in the dominant rhythmic structure. We investigate the influence of different motor effectors on rate-specific processing in both domains. A perception and a synchronization task involving syllable and piano tone sequences and motor effectors typically associated with speech (whispering) and music (finger-tapping) were tested at slow (~2 Hz) and fast rates (~4.5 Hz). Although synchronization performance was generally better at slow rates, the motor effectors exhibited specific rate preferences. Finger-tapping was advantaged compared to whispering at slow but not at faster rates, with synchronization being effector-dependent at slow, but highly correlated at faster rates. Perception of speech and music was better at different rates and predicted by a fast general and a slow finger-tapping synchronization component. Our data suggests partially independent rhythmic timing mechanisms for speech and music, possibly related to a differential recruitment of cortical motor circuitry.
Collapse
Affiliation(s)
- Alice Vivien Barchet
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
| | - Molly J Henry
- Research Group 'Neural and Environmental Rhythms', Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Toronto Metropolitan University, Toronto, Canada
| | - Claire Pelofi
- Music and Audio Research Laboratory, New York University, New York, NY, USA
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA
| | - Johanna M Rimmele
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA.
| |
Collapse
|
26
|
Kim G, Kim DK, Jeong H. Spontaneous emergence of rudimentary music detectors in deep neural networks. Nat Commun 2024; 15:148. [PMID: 38168097 PMCID: PMC10761941 DOI: 10.1038/s41467-023-44516-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
Music exists in almost every society, has universal acoustic features, and is processed by distinct neural circuits in humans even with no experience of musical training. However, it remains unclear how these innate characteristics emerge and what functions they serve. Here, using an artificial deep neural network that models the auditory information processing of the brain, we show that units tuned to music can spontaneously emerge by learning natural sound detection, even without learning music. The music-selective units encoded the temporal structure of music in multiple timescales, following the population-level response characteristics observed in the brain. We found that the process of generalization is critical for the emergence of music-selectivity and that music-selectivity can work as a functional basis for the generalization of natural sound, thereby elucidating its origin. These findings suggest that evolutionary adaptation to process natural sounds can provide an initial blueprint for our sense of music.
Collapse
Affiliation(s)
- Gwangsu Kim
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea
| | - Dong-Kyum Kim
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea
| | - Hawoong Jeong
- Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea.
- Center for Complex Systems, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Korea.
| |
Collapse
|
27
|
Olson HA, Chen EM, Lydic KO, Saxe RR. Left-Hemisphere Cortical Language Regions Respond Equally to Observed Dialogue and Monologue. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:575-610. [PMID: 38144236 PMCID: PMC10745132 DOI: 10.1162/nol_a_00123] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 09/20/2023] [Indexed: 12/26/2023]
Abstract
Much of the language we encounter in our everyday lives comes in the form of conversation, yet the majority of research on the neural basis of language comprehension has used input from only one speaker at a time. Twenty adults were scanned while passively observing audiovisual conversations using functional magnetic resonance imaging. In a block-design task, participants watched 20 s videos of puppets speaking either to another puppet (the dialogue condition) or directly to the viewer (the monologue condition), while the audio was either comprehensible (played forward) or incomprehensible (played backward). Individually functionally localized left-hemisphere language regions responded more to comprehensible than incomprehensible speech but did not respond differently to dialogue than monologue. In a second task, participants watched videos (1-3 min each) of two puppets conversing with each other, in which one puppet was comprehensible while the other's speech was reversed. All participants saw the same visual input but were randomly assigned which character's speech was comprehensible. In left-hemisphere cortical language regions, the time course of activity was correlated only among participants who heard the same character speaking comprehensibly, despite identical visual input across all participants. For comparison, some individually localized theory of mind regions and right-hemisphere homologues of language regions responded more to dialogue than monologue in the first task, and in the second task, activity in some regions was correlated across all participants regardless of which character was speaking comprehensibly. Together, these results suggest that canonical left-hemisphere cortical language regions are not sensitive to differences between observed dialogue and monologue.
Collapse
|
28
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
29
|
Feather J, Leclerc G, Mądry A, McDermott JH. Model metamers reveal divergent invariances between biological and artificial neural networks. Nat Neurosci 2023; 26:2017-2034. [PMID: 37845543 PMCID: PMC10620097 DOI: 10.1038/s41593-023-01442-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/29/2023] [Indexed: 10/18/2023]
Abstract
Deep neural network models of sensory systems are often proposed to learn representational transformations with invariances like those in the brain. To reveal these invariances, we generated 'model metamers', stimuli whose activations within a model stage are matched to those of a natural stimulus. Metamers for state-of-the-art supervised and unsupervised neural network models of vision and audition were often completely unrecognizable to humans when generated from late model stages, suggesting differences between model and human invariances. Targeted model changes improved human recognizability of model metamers but did not eliminate the overall human-model discrepancy. The human recognizability of a model's metamers was well predicted by their recognizability by other models, suggesting that models contain idiosyncratic invariances in addition to those required by the task. Metamer recognizability dissociated from both traditional brain-based benchmarks and adversarial vulnerability, revealing a distinct failure mode of existing sensory models and providing a complementary benchmark for model assessment.
Collapse
Affiliation(s)
- Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Computational Neuroscience, Flatiron Institute, Cambridge, MA, USA.
| | - Guillaume Leclerc
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aleksander Mądry
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
30
|
Kosakowski HL, Norman-Haignere S, Mynick A, Takahashi A, Saxe R, Kanwisher N. Preliminary evidence for selective cortical responses to music in one-month-old infants. Dev Sci 2023; 26:e13387. [PMID: 36951215 DOI: 10.1111/desc.13387] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 02/17/2023] [Accepted: 02/21/2023] [Indexed: 03/24/2023]
Abstract
Prior studies have observed selective neural responses in the adult human auditory cortex to music and speech that cannot be explained by the differing lower-level acoustic properties of these stimuli. Does infant cortex exhibit similarly selective responses to music and speech shortly after birth? To answer this question, we attempted to collect functional magnetic resonance imaging (fMRI) data from 45 sleeping infants (2.0- to 11.9-weeks-old) while they listened to monophonic instrumental lullabies and infant-directed speech produced by a mother. To match acoustic variation between music and speech sounds we (1) recorded music from instruments that had a similar spectral range as female infant-directed speech, (2) used a novel excitation-matching algorithm to match the cochleagrams of music and speech stimuli, and (3) synthesized "model-matched" stimuli that were matched in spectrotemporal modulation statistics to (yet perceptually distinct from) music or speech. Of the 36 infants we collected usable data from, 19 had significant activations to sounds overall compared to scanner noise. From these infants, we observed a set of voxels in non-primary auditory cortex (NPAC) but not in Heschl's Gyrus that responded significantly more to music than to each of the other three stimulus types (but not significantly more strongly than to the background scanner noise). In contrast, our planned analyses did not reveal voxels in NPAC that responded more to speech than to model-matched speech, although other unplanned analyses did. These preliminary findings suggest that music selectivity arises within the first month of life. A video abstract of this article can be viewed at https://youtu.be/c8IGFvzxudk. RESEARCH HIGHLIGHTS: Responses to music, speech, and control sounds matched for the spectrotemporal modulation-statistics of each sound were measured from 2- to 11-week-old sleeping infants using fMRI. Auditory cortex was significantly activated by these stimuli in 19 out of 36 sleeping infants. Selective responses to music compared to the three other stimulus classes were found in non-primary auditory cortex but not in nearby Heschl's Gyrus. Selective responses to speech were not observed in planned analyses but were observed in unplanned, exploratory analyses.
Collapse
Affiliation(s)
- Heather L Kosakowski
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| | | | - Anna Mynick
- Psychological and Brain Sciences, Dartmouth College, Hannover, New Hampshire, USA
| | - Atsushi Takahashi
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Rebecca Saxe
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute, of Technology, Cambridge, Massachusetts, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Center for Brains, Minds and Machines, Cambridge, Massachusetts, USA
| |
Collapse
|
31
|
Damera SR, Malone PS, Stevens BW, Klein R, Eberhardt SP, Auer ET, Bernstein LE, Riesenhuber M. Metamodal Coupling of Vibrotactile and Auditory Speech Processing Systems through Matched Stimulus Representations. J Neurosci 2023; 43:4984-4996. [PMID: 37197979 PMCID: PMC10324991 DOI: 10.1523/jneurosci.1710-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 03/10/2023] [Accepted: 04/29/2023] [Indexed: 05/19/2023] Open
Abstract
It has been postulated that the brain is organized by "metamodal," sensory-independent cortical modules capable of performing tasks (e.g., word recognition) in both "standard" and novel sensory modalities. Still, this theory has primarily been tested in sensory-deprived individuals, with mixed evidence in neurotypical subjects, thereby limiting its support as a general principle of brain organization. Critically, current theories of metamodal processing do not specify requirements for successful metamodal processing at the level of neural representations. Specification at this level may be particularly important in neurotypical individuals, where novel sensory modalities must interface with existing representations for the standard sense. Here we hypothesized that effective metamodal engagement of a cortical area requires congruence between stimulus representations in the standard and novel sensory modalities in that region. To test this, we first used fMRI to identify bilateral auditory speech representations. We then trained 20 human participants (12 female) to recognize vibrotactile versions of auditory words using one of two auditory-to-vibrotactile algorithms. The vocoded algorithm attempted to match the encoding scheme of auditory speech while the token-based algorithm did not. Crucially, using fMRI, we found that only in the vocoded group did trained-vibrotactile stimuli recruit speech representations in the superior temporal gyrus and lead to increased coupling between them and somatosensory areas. Our results advance our understanding of brain organization by providing new insight into unlocking the metamodal potential of the brain, thereby benefitting the design of novel sensory substitution devices that aim to tap into existing processing streams in the brain.SIGNIFICANCE STATEMENT It has been proposed that the brain is organized by "metamodal," sensory-independent modules specialized for performing certain tasks. This idea has inspired therapeutic applications, such as sensory substitution devices, for example, enabling blind individuals "to see" by transforming visual input into soundscapes. Yet, other studies have failed to demonstrate metamodal engagement. Here, we tested the hypothesis that metamodal engagement in neurotypical individuals requires matching the encoding schemes between stimuli from the novel and standard sensory modalities. We trained two groups of subjects to recognize words generated by one of two auditory-to-vibrotactile transformations. Critically, only vibrotactile stimuli that were matched to the neural encoding of auditory speech engaged auditory speech areas after training. This suggests that matching encoding schemes is critical to unlocking the brain's metamodal potential.
Collapse
Affiliation(s)
- Srikanth R Damera
- Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20007
| | - Patrick S Malone
- Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20007
| | - Benson W Stevens
- Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20007
| | - Richard Klein
- Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20007
| | - Silvio P Eberhardt
- Department of Speech Language & Hearing Sciences, George Washington University, Washington, DC 20052
| | - Edward T Auer
- Department of Speech Language & Hearing Sciences, George Washington University, Washington, DC 20052
| | - Lynne E Bernstein
- Department of Speech Language & Hearing Sciences, George Washington University, Washington, DC 20052
| | | |
Collapse
|
32
|
Chen X, Affourtit J, Ryskin R, Regev TI, Norman-Haignere S, Jouravlev O, Malik-Moraleda S, Kean H, Varley R, Fedorenko E. The human language system, including its inferior frontal component in "Broca's area," does not support music perception. Cereb Cortex 2023; 33:7904-7929. [PMID: 37005063 PMCID: PMC10505454 DOI: 10.1093/cercor/bhad087] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 01/02/2023] [Accepted: 01/03/2023] [Indexed: 04/04/2023] Open
Abstract
Language and music are two human-unique capacities whose relationship remains debated. Some have argued for overlap in processing mechanisms, especially for structure processing. Such claims often concern the inferior frontal component of the language system located within "Broca's area." However, others have failed to find overlap. Using a robust individual-subject fMRI approach, we examined the responses of language brain regions to music stimuli, and probed the musical abilities of individuals with severe aphasia. Across 4 experiments, we obtained a clear answer: music perception does not engage the language system, and judgments about music structure are possible even in the presence of severe damage to the language network. In particular, the language regions' responses to music are generally low, often below the fixation baseline, and never exceed responses elicited by nonmusic auditory conditions, like animal sounds. Furthermore, the language regions are not sensitive to music structure: they show low responses to both intact and structure-scrambled music, and to melodies with vs. without structural violations. Finally, in line with past patient investigations, individuals with aphasia, who cannot judge sentence grammaticality, perform well on melody well-formedness judgments. Thus, the mechanisms that process structure in language do not appear to process music, including music syntax.
Collapse
Affiliation(s)
- Xuanyi Chen
- Department of Cognitive Sciences, Rice University, TX 77005, United States
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Josef Affourtit
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Rachel Ryskin
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Cognitive & Information Sciences, University of California, Merced, Merced, CA 95343, United States
| | - Tamar I Regev
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Samuel Norman-Haignere
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, United States
- Department of Neuroscience, University of Rochester Medical Center, Rochester, NY, United States
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, United States
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
| | - Olessia Jouravlev
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Cognitive Science, Carleton University, Ottawa, ON, Canada
| | - Saima Malik-Moraleda
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, United States
| | - Hope Kean
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Rosemary Varley
- Psychology & Language Sciences, UCL, London, WCN1 1PF, United Kingdom
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, United States
| |
Collapse
|
33
|
Karlsson EM, Hugdahl K, Hirnstein M, Carey DP. Analysis of distributions reveals real differences on dichotic listening scores between left- and right-handers. Cereb Cortex Commun 2023; 4:tgad009. [PMID: 37342803 PMCID: PMC10262840 DOI: 10.1093/texcom/tgad009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 06/23/2023] Open
Abstract
About 95% of right-handers and 70% of left-handers have a left-hemispheric specialization for language. Dichotic listening is often used as an indirect measure of this language asymmetry. However, while it reliably produces a right-ear advantage (REA), corresponding to the left-hemispheric specialization of language, it paradoxically often fails to obtain statistical evidence of mean differences between left- and right-handers. We hypothesized that non-normality of the underlying distributions might be in part responsible for the similarities in means. Here, we compare the mean ear advantage scores, and also contrast the distributions at multiple quantiles, in two large independent samples (Ns = 1,358 and 1,042) of right-handers and left-handers. Right-handers had an increased mean REA, and a larger proportion had an REA than in the left-handers. We also found that more left-handers are represented in the left-eared end of the distribution. These data suggest that subtle shifts in the distributions of DL scores for right- and left-handers may be at least partially responsible for the unreliability of significantly reduced mean REA in left-handers.
Collapse
Affiliation(s)
- Emma M Karlsson
- Institute of Cognitive Neuroscience, School of Human and Behavioural Sciences, Bangor University, Bangor, United Kingdom
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Kenneth Hugdahl
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway
| | - Marco Hirnstein
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway
| | - David P Carey
- Corresponding author: David P. Carey, School of Human and Behavioural Sciences, Bangor University, Bangor LL57 2AS, UK.
| |
Collapse
|
34
|
Keshishian M, Akkol S, Herrero J, Bickel S, Mehta AD, Mesgarani N. Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex. Nat Hum Behav 2023; 7:740-753. [PMID: 36864134 PMCID: PMC10417567 DOI: 10.1038/s41562-023-01520-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/05/2023] [Indexed: 03/04/2023]
Abstract
The precise role of the human auditory cortex in representing speech sounds and transforming them to meaning is not yet fully understood. Here we used intracranial recordings from the auditory cortex of neurosurgical patients as they listened to natural speech. We found an explicit, temporally ordered and anatomically distributed neural encoding of multiple linguistic features, including phonetic, prelexical phonotactics, word frequency, and lexical-phonological and lexical-semantic information. Grouping neural sites on the basis of their encoded linguistic features revealed a hierarchical pattern, with distinct representations of prelexical and postlexical features distributed across various auditory areas. While sites with longer response latencies and greater distance from the primary auditory cortex encoded higher-level linguistic features, the encoding of lower-level features was preserved and not discarded. Our study reveals a cumulative mapping of sound to meaning and provides empirical evidence for validating neurolinguistic and psycholinguistic models of spoken word recognition that preserve the acoustic variations in speech.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Serdar Akkol
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Jose Herrero
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Stephan Bickel
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Ashesh D Mehta
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
| |
Collapse
|
35
|
Leong ATL, Wong EC, Wang X, Wu EX. Hippocampus Modulates Vocalizations Responses at Early Auditory Centers. Neuroimage 2023; 270:119943. [PMID: 36828157 DOI: 10.1016/j.neuroimage.2023.119943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 02/13/2023] [Indexed: 02/24/2023] Open
Abstract
Despite its prominence in learning and memory, hippocampal influence in early auditory processing centers remains unknown. Here, we examined how hippocampal activity modulates sound-evoked responses in the auditory midbrain and thalamus using optogenetics and functional MRI (fMRI) in rodents. Ventral hippocampus (vHP) excitatory neuron stimulation at 5 Hz evoked robust hippocampal activity that propagates to the primary auditory cortex. We then tested 5 Hz vHP stimulation paired with either natural vocalizations or artificial/noise acoustic stimuli. vHP stimulation enhanced auditory responses to vocalizations (with a negative or positive valence) in the inferior colliculus, medial geniculate body, and auditory cortex, but not to their temporally reversed counterparts (artificial sounds) or broadband noise. Meanwhile, pharmacological vHP inactivation diminished response selectivity to vocalizations. These results directly reveal the large-scale hippocampal participation in natural sound processing at early centers of the ascending auditory pathway. They expand our present understanding of hippocampus in global auditory networks.
Collapse
Affiliation(s)
- Alex T L Leong
- Laboratory of Biomedical Imaging and Signal Processing, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
| | - Eddie C Wong
- Laboratory of Biomedical Imaging and Signal Processing, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Xunda Wang
- Laboratory of Biomedical Imaging and Signal Processing, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Ed X Wu
- Laboratory of Biomedical Imaging and Signal Processing, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
| |
Collapse
|
36
|
Setti F, Handjaras G, Bottari D, Leo A, Diano M, Bruno V, Tinti C, Cecchetti L, Garbarini F, Pietrini P, Ricciardi E. A modality-independent proto-organization of human multisensory areas. Nat Hum Behav 2023; 7:397-410. [PMID: 36646839 PMCID: PMC10038796 DOI: 10.1038/s41562-022-01507-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 12/05/2022] [Indexed: 01/18/2023]
Abstract
The processing of multisensory information is based upon the capacity of brain regions, such as the superior temporal cortex, to combine information across modalities. However, it is still unclear whether the representation of coherent auditory and visual events requires any prior audiovisual experience to develop and function. Here we measured brain synchronization during the presentation of an audiovisual, audio-only or video-only version of the same narrative in distinct groups of sensory-deprived (congenitally blind and deaf) and typically developed individuals. Intersubject correlation analysis revealed that the superior temporal cortex was synchronized across auditory and visual conditions, even in sensory-deprived individuals who lack any audiovisual experience. This synchronization was primarily mediated by low-level perceptual features, and relied on a similar modality-independent topographical organization of slow temporal dynamics. The human superior temporal cortex is naturally endowed with a functional scaffolding to yield a common representation across multisensory events.
Collapse
Affiliation(s)
- Francesca Setti
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Davide Bottari
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Andrea Leo
- Department of Translational Research and Advanced Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Matteo Diano
- Department of Psychology, University of Turin, Turin, Italy
| | - Valentina Bruno
- Manibus Lab, Department of Psychology, University of Turin, Turin, Italy
| | - Carla Tinti
- Department of Psychology, University of Turin, Turin, Italy
| | - Luca Cecchetti
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Pietro Pietrini
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | |
Collapse
|
37
|
Weise A, Grimm S, Maria Rimmele J, Schröger E. Auditory representations for long lasting sounds: Insights from event-related brain potentials and neural oscillations. BRAIN AND LANGUAGE 2023; 237:105221. [PMID: 36623340 DOI: 10.1016/j.bandl.2022.105221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The basic features of short sounds, such as frequency and intensity including their temporal dynamics, are integrated in a unitary representation. Knowledge on how our brain processes long lasting sounds is scarce. We review research utilizing the Mismatch Negativity event-related potential and neural oscillatory activity for studying representations for long lasting simple versus complex sounds such as sinusoidal tones versus speech. There is evidence for a temporal constraint in the formation of auditory representations: Auditory edges like sound onsets within long lasting sounds open a temporal window of about 350 ms in which the sounds' dynamics are integrated into a representation, while information beyond that window contributes less to that representation. This integration window segments the auditory input into short chunks. We argue that the representations established in adjacent integration windows can be concatenated into an auditory representation of a long sound, thus, overcoming the temporal constraint.
Collapse
Affiliation(s)
- Annekathrin Weise
- Department of Psychology, Ludwig-Maximilians-University Munich, Germany; Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Sabine Grimm
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Johanna Maria Rimmele
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Germany; Center for Language, Music and Emotion, New York University, Max Planck Institute, Department of Psychology, 6 Washington Place, New York, NY 10003, United States.
| | - Erich Schröger
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| |
Collapse
|
38
|
Zoefel B, Gilbert RA, Davis MH. Intelligibility improves perception of timing changes in speech. PLoS One 2023; 18:e0279024. [PMID: 36634109 PMCID: PMC9836318 DOI: 10.1371/journal.pone.0279024] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 11/28/2022] [Indexed: 01/13/2023] Open
Abstract
Auditory rhythms are ubiquitous in music, speech, and other everyday sounds. Yet, it is unclear how perceived rhythms arise from the repeating structure of sounds. For speech, it is unclear whether rhythm is solely derived from acoustic properties (e.g., rapid amplitude changes), or if it is also influenced by the linguistic units (syllables, words, etc.) that listeners extract from intelligible speech. Here, we present three experiments in which participants were asked to detect an irregularity in rhythmically spoken speech sequences. In each experiment, we reduce the number of possible stimulus properties that differ between intelligible and unintelligible speech sounds and show that these acoustically-matched intelligibility conditions nonetheless lead to differences in rhythm perception. In Experiment 1, we replicate a previous study showing that rhythm perception is improved for intelligible (16-channel vocoded) as compared to unintelligible (1-channel vocoded) speech-despite near-identical broadband amplitude modulations. In Experiment 2, we use spectrally-rotated 16-channel speech to show the effect of intelligibility cannot be explained by differences in spectral complexity. In Experiment 3, we compare rhythm perception for sine-wave speech signals when they are heard as non-speech (for naïve listeners), and subsequent to training, when identical sounds are perceived as speech. In all cases, detection of rhythmic regularity is enhanced when participants perceive the stimulus as speech compared to when they do not. Together, these findings demonstrate that intelligibility enhances the perception of timing changes in speech, which is hence linked to processes that extract abstract linguistic units from sound.
Collapse
Affiliation(s)
- Benedikt Zoefel
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre National de la Recherche Scientifique (CNRS), Centre de Recherche Cerveau et Cognition (CerCo), Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Rebecca A. Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
39
|
Adolfi F, Wareham T, van Rooij I. A Computational Complexity Perspective on Segmentation as a Cognitive Subcomputation. Top Cogn Sci 2022; 15:255-273. [PMID: 36453947 DOI: 10.1111/tops.12629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 12/05/2022]
Abstract
Computational feasibility is a widespread concern that guides the framing and modeling of natural and artificial intelligence. The specification of cognitive system capacities is often shaped by unexamined intuitive assumptions about the search space and complexity of a subcomputation. However, a mistaken intuition might make such initial conceptualizations misleading for what empirical questions appear relevant later on. We undertake here computational-level modeling and complexity analyses of segmentation - a widely hypothesized subcomputation that plays a requisite role in explanations of capacities across domains, such as speech recognition, music cognition, active sensing, event memory, action parsing, and statistical learning - as a case study to show how crucial it is to formally assess these assumptions. We mathematically prove two sets of results regarding computational hardness and search space size that may run counter to intuition, and position their implications with respect to existing views on the subcapacity.
Collapse
Affiliation(s)
- Federico Adolfi
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max‐Planck Society
- School of Psychological Science University of Bristol
| | - Todd Wareham
- Department of Computer Science Memorial University of Newfoundland
| | - Iris van Rooij
- Donders Institute for Brain, Cognition, and Behaviour Radboud University
- School of Artificial Intelligence Radboud University
- Department of Linguistics, Cognitive Science, and Semiotics & Interacting Minds Centre Aarhus University
| |
Collapse
|
40
|
Daikoku T, Goswami U. Hierarchical amplitude modulation structures and rhythm patterns: Comparing Western musical genres, song, and nature sounds to Babytalk. PLoS One 2022; 17:e0275631. [PMID: 36240225 PMCID: PMC9565671 DOI: 10.1371/journal.pone.0275631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
Statistical learning of physical stimulus characteristics is important for the development of cognitive systems like language and music. Rhythm patterns are a core component of both systems, and rhythm is key to language acquisition by infants. Accordingly, the physical stimulus characteristics that yield speech rhythm in "Babytalk" may also describe the hierarchical rhythmic relationships that characterize human music and song. Computational modelling of the amplitude envelope of "Babytalk" (infant-directed speech, IDS) using a demodulation approach (Spectral-Amplitude Modulation Phase Hierarchy model, S-AMPH) can describe these characteristics. S-AMPH modelling of Babytalk has shown previously that bands of amplitude modulations (AMs) at different temporal rates and their phase relations help to create its structured inherent rhythms. Additionally, S-AMPH modelling of children's nursery rhymes shows that different rhythm patterns (trochaic, iambic, dactylic) depend on the phase relations between AM bands centred on ~2 Hz and ~5 Hz. The importance of these AM phase relations was confirmed via a second demodulation approach (PAD, Probabilistic Amplitude Demodulation). Here we apply both S-AMPH and PAD to demodulate the amplitude envelopes of Western musical genres and songs. Quasi-rhythmic and non-human sounds found in nature (birdsong, rain, wind) were utilized for control analyses. We expected that the physical stimulus characteristics in human music and song from an AM perspective would match those of IDS. Given prior speech-based analyses, we also expected that AM cycles derived from the modelling may identify musical units like crotchets, quavers and demi-quavers. Both models revealed an hierarchically-nested AM modulation structure for music and song, but not nature sounds. This AM modulation structure for music and song matched IDS. Both models also generated systematic AM cycles yielding musical units like crotchets and quavers. Both music and language are created by humans and shaped by culture. Acoustic rhythm in IDS and music appears to depend on many of the same physical characteristics, facilitating learning.
Collapse
Affiliation(s)
- Tatsuya Daikoku
- Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
- International Research Center for Neurointelligence, The University of Tokyo, Bunkyo City, Tokyo, Japan
- Center for Brain, Mind and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
- * E-mail:
| | - Usha Goswami
- Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
41
|
Lipkin B, Tuckute G, Affourtit J, Small H, Mineroff Z, Kean H, Jouravlev O, Rakocevic L, Pritchett B, Siegelman M, Hoeflin C, Pongos A, Blank IA, Struhl MK, Ivanova A, Shannon S, Sathe A, Hoffmann M, Nieto-Castañón A, Fedorenko E. Probabilistic atlas for the language network based on precision fMRI data from >800 individuals. Sci Data 2022; 9:529. [PMID: 36038572 PMCID: PMC9424256 DOI: 10.1038/s41597-022-01645-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/09/2022] [Indexed: 11/13/2022] Open
Abstract
Two analytic traditions characterize fMRI language research. One relies on averaging activations across individuals. This approach has limitations: because of inter-individual variability in the locations of language areas, any given voxel/vertex in a common brain space is part of the language network in some individuals but in others, may belong to a distinct network. An alternative approach relies on identifying language areas in each individual using a functional 'localizer'. Because of its greater sensitivity, functional resolution, and interpretability, functional localization is gaining popularity, but it is not always feasible, and cannot be applied retroactively to past studies. To bridge these disjoint approaches, we created a probabilistic functional atlas using fMRI data for an extensively validated language localizer in 806 individuals. This atlas enables estimating the probability that any given location in a common space belongs to the language network, and thus can help interpret group-level activation peaks and lesion locations, or select voxels/electrodes for analysis. More meaningful comparisons of findings across studies should increase robustness and replicability in language research.
Collapse
Affiliation(s)
- Benjamin Lipkin
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josef Affourtit
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Hannah Small
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA
| | - Zachary Mineroff
- Human-computer Interaction Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Hope Kean
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Olessia Jouravlev
- Department of Cognitive Science, Carleton University, Ottawa, ON, Canada
| | - Lara Rakocevic
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Brianna Pritchett
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Caitlyn Hoeflin
- Harris School of Public Policy, University of Chicago, Chicago, IL, USA
| | - Alvincé Pongos
- Department of Bioengineering, University of California, Berkeley, CA, USA
| | - Idan A Blank
- Department of Psychology, University of California, Los Angeles, CA, USA
| | - Melissa Kline Struhl
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Anna Ivanova
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Steven Shannon
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Malte Hoffmann
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Cambridge, MA, USA
| | - Alfonso Nieto-Castañón
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Speech, Hearing, Bioscience, and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
42
|
Weiller C, Reisert M, Glauche V, Musso M, Rijntjes M. The dual-loop model for combining external and internal worlds in our brain. Neuroimage 2022; 263:119583. [PMID: 36007823 DOI: 10.1016/j.neuroimage.2022.119583] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/03/2022] [Accepted: 08/21/2022] [Indexed: 11/19/2022] Open
Abstract
Intelligible communication with others as well as covert conscious thought requires us to combine a representation of the external world with inner abstract concepts. Interaction with the external world through sensory perception and motor execution is arranged as sequences in time and space, whereas abstract thought and invariant categories are independent of the moment. Using advanced MRI-based fibre tracking on high resolution data from 183 participants in the Human Connectome Project, we identified two large supramodal systems comprising specific cortical regions and their connecting fibre tracts; a dorsal one for processing of sequences in time and space, and a ventral one for concepts and categories. We found that two hub regions exist in the executive front and the perceptive back of the brain where these two cognitive processes converge, constituting a dual-loop model. The hubs are located in the onto- and phylogenetically youngest regions of the cortex. We propose that this hub feature serves as the neural substrate for the more abstract sense of syntax in humans, i.e. for the system populating sequences with content in all cognitive domains. The hubs bring together two separate systems (dorsal and ventral) at the front and the back of the brain and create a closed-loop. The closed-loop facilitates recursivity and forethought, which we use twice; namely, for communication with others about things that are not there and for covert thought.
Collapse
Affiliation(s)
- Cornelius Weiller
- Department of Neurology and Clinical Neuroscience, Medical Center, Faculty of Medicine, University of Freiburg; Breisacher Street 64, Freiburg D- 79104, Germany.
| | - Marco Reisert
- Department of Medical Physics, Medical Center, Faculty of Medicine, University of Freiburg; Breisacher Street 64, Freiburg D- 79104, Germany; Department of Stereotactic and Functional Neurosurgery, Medical Center, Faculty of Medicine, University of Freiburg; Breisacher Street 64, Freiburg D- 79104, Germany
| | - Volkmar Glauche
- Department of Neurology and Clinical Neuroscience, Medical Center, Faculty of Medicine, University of Freiburg; Breisacher Street 64, Freiburg D- 79104, Germany
| | - Mariachristina Musso
- Department of Neurology and Clinical Neuroscience, Medical Center, Faculty of Medicine, University of Freiburg; Breisacher Street 64, Freiburg D- 79104, Germany
| | - Michel Rijntjes
- Department of Neurology and Clinical Neuroscience, Medical Center, Faculty of Medicine, University of Freiburg; Breisacher Street 64, Freiburg D- 79104, Germany
| |
Collapse
|
43
|
Anurova I, Vetchinnikova S, Dobrego A, Williams N, Mikusova N, Suni A, Mauranen A, Palva S. Event-related responses reflect chunk boundaries in natural speech. Neuroimage 2022; 255:119203. [PMID: 35413442 DOI: 10.1016/j.neuroimage.2022.119203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 03/22/2022] [Accepted: 04/08/2022] [Indexed: 10/18/2022] Open
Abstract
Chunking language has been proposed to be vital for comprehension enabling the extraction of meaning from a continuous stream of speech. However, neurocognitive mechanisms of chunking are poorly understood. The present study investigated neural correlates of chunk boundaries intuitively identified by listeners in natural speech drawn from linguistic corpora using magneto- and electroencephalography (MEEG). In a behavioral experiment, subjects marked chunk boundaries in the excerpts intuitively, which revealed highly consistent chunk boundary markings across the subjects. We next recorded brain activity to investigate whether chunk boundaries with high and medium agreement rates elicit distinct evoked responses compared to non-boundaries. Pauses placed at chunk boundaries elicited a closure positive shift with the sources over bilateral auditory cortices. In contrast, pauses placed within a chunk were perceived as interruptions and elicited a biphasic emitted potential with sources located in the bilateral primary and non-primary auditory areas with right-hemispheric dominance, and in the right inferior frontal cortex. Furthermore, pauses placed at stronger boundaries elicited earlier and more prominent activation over the left hemisphere suggesting that brain responses to chunk boundaries of natural speech can be modulated by the relative strength of different linguistic cues, such as syntactic structure and prosody.
Collapse
Affiliation(s)
- Irina Anurova
- Helsinki Institute of Life Sciences, Neuroscience Center, University of Helsinki, Finland; BioMag Laboratory, HUS Medical Imaging Center, Helsinki, Finland.
| | | | | | - Nitin Williams
- Helsinki Institute of Life Sciences, Neuroscience Center, University of Helsinki, Finland; Department of Languages, University of Helsinki, Finland
| | - Nina Mikusova
- Department of Languages, University of Helsinki, Finland
| | - Antti Suni
- Department of Languages, University of Helsinki, Finland
| | - Anna Mauranen
- Department of Languages, University of Helsinki, Finland
| | - Satu Palva
- Helsinki Institute of Life Sciences, Neuroscience Center, University of Helsinki, Finland; Centre for Cognitive Neuroscience, Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom.
| |
Collapse
|
44
|
Rogalsky C, Basilakos A, Rorden C, Pillay S, LaCroix AN, Keator L, Mickelsen S, Anderson SW, Love T, Fridriksson J, Binder J, Hickok G. The Neuroanatomy of Speech Processing: A Large-scale Lesion Study. J Cogn Neurosci 2022; 34:1355-1375. [PMID: 35640102 PMCID: PMC9274306 DOI: 10.1162/jocn_a_01876] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The neural basis of language has been studied for centuries, yet the networks critically involved in simply identifying or understanding a spoken word remain elusive. Several functional-anatomical models of critical neural substrates of receptive speech have been proposed, including (1) auditory-related regions in the left mid-posterior superior temporal lobe, (2) motor-related regions in the left frontal lobe (in normal and/or noisy conditions), (3) the left anterior superior temporal lobe, or (4) bilateral mid-posterior superior temporal areas. One difficulty in comparing these models is that they often focus on different aspects of the sound-to-meaning pathway and are supported by different types of stimuli and tasks. Two auditory tasks that are typically used in separate studies-syllable discrimination and word comprehension-often yield different conclusions. We assessed syllable discrimination (words and nonwords) and word comprehension (clear speech and with a noise masker) in 158 individuals with focal brain damage: left (n = 113) or right (n = 19) hemisphere stroke, left (n = 18) or right (n = 8) anterior temporal lobectomy, and 26 neurologically intact controls. Discrimination and comprehension tasks are doubly dissociable both behaviorally and neurologically. In support of a bilateral model, clear speech comprehension was near ceiling in 95% of left stroke cases and right temporal damage impaired syllable discrimination. Lesion-symptom mapping analyses for the syllable discrimination and noisy word comprehension tasks each implicated most of the left superior temporal gyrus. Comprehension but not discrimination tasks also implicated the left posterior middle temporal gyrus, whereas discrimination but not comprehension tasks also implicated more dorsal sensorimotor regions in posterior perisylvian cortex.
Collapse
|
45
|
Norman-Haignere SV, Feather J, Boebinger D, Brunner P, Ritaccio A, McDermott JH, Schalk G, Kanwisher N. A neural population selective for song in human auditory cortex. Curr Biol 2022; 32:1470-1484.e12. [PMID: 35196507 PMCID: PMC9092957 DOI: 10.1016/j.cub.2022.01.069] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 10/26/2021] [Accepted: 01/24/2022] [Indexed: 12/18/2022]
Abstract
How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- Zuckerman Institute, Columbia University, New York, NY, USA; HHMI Fellow of the Life Sciences Research Foundation, Chevy Chase, MD, USA; Laboratoire des Sytèmes Perceptifs, Département d'Études Cognitives, ENS, PSL University, CNRS, Paris, France; Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA; Department of Neuroscience, University of Rochester Medical Center, Rochester, NY, USA; Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
| | - Peter Brunner
- Department of Neurology, Albany Medical College, Albany, NY, USA; National Center for Adaptive Neurotechnologies, Albany, NY, USA; Department of Neurosurgery, Washington University School of Medicine, St. Louis, MO, USA
| | - Anthony Ritaccio
- Department of Neurology, Albany Medical College, Albany, NY, USA; Department of Neurology, Mayo Clinic, Jacksonville, FL, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA; Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
| | - Gerwin Schalk
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA
| |
Collapse
|
46
|
Staib M, Frühholz S. Distinct functional levels of human voice processing in the auditory cortex. Cereb Cortex 2022; 33:1170-1185. [PMID: 35348635 PMCID: PMC9930621 DOI: 10.1093/cercor/bhac128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 02/03/2022] [Accepted: 03/07/2022] [Indexed: 11/12/2022] Open
Abstract
Voice signaling is integral to human communication, and a cortical voice area seemed to support the discrimination of voices from other auditory objects. This large cortical voice area in the auditory cortex (AC) was suggested to process voices selectively, but its functional differentiation remained elusive. We used neuroimaging while humans processed voices and nonvoice sounds, and artificial sounds that mimicked certain voice sound features. First and surprisingly, specific auditory cortical voice processing beyond basic acoustic sound analyses is only supported by a very small portion of the originally described voice area in higher-order AC located centrally in superior Te3. Second, besides this core voice processing area, large parts of the remaining voice area in low- and higher-order AC only accessorily process voices and might primarily pick up nonspecific psychoacoustic differences between voices and nonvoices. Third, a specific subfield of low-order AC seems to specifically decode acoustic sound features that are relevant but not exclusive for voice detection. Taken together, the previously defined voice area might have been overestimated since cortical support for human voice processing seems rather restricted. Cortical voice processing also seems to be functionally more diverse and embedded in broader functional principles of the human auditory system.
Collapse
Affiliation(s)
- Matthias Staib
- Cognitive and Affective Neuroscience Unit, University of Zurich, 8050 Zurich, Switzerland
| | - Sascha Frühholz
- Corresponding author: Department of Psychology, University of Zürich, Binzmuhlestrasse 14/18, 8050 Zürich, Switzerland.
| |
Collapse
|
47
|
Norman-Haignere SV, Long LK, Devinsky O, Doyle W, Irobunda I, Merricks EM, Feldstein NA, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 2022; 6:455-469. [PMID: 35145280 PMCID: PMC8957490 DOI: 10.1038/s41562-021-01261-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 11/18/2021] [Indexed: 01/11/2023]
Abstract
To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows-the time window when stimuli alter the neural response-and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,HHMI Postdoctoral Fellow of the Life Sciences Research Foundation
| | - Laura K. Long
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,Doctoral Program in Neurobiology and Behavior, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center,Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center,Department of Neurosurgery, NYU Langone Medical Center
| | - Ifeoma Irobunda
- Department of Neurology, Columbia University Irving Medical Center
| | | | - Neil A. Feldstein
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center,Comprehensive Epilepsy Center, NYU Langone Medical Center,Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,Doctoral Program in Neurobiology and Behavior, Columbia University,Department of Electrical Engineering, Columbia University
| |
Collapse
|
48
|
Tuckute G, Paunov A, Kean H, Small H, Mineroff Z, Blank I, Fedorenko E. Frontal language areas do not emerge in the absence of temporal language areas: A case study of an individual born without a left temporal lobe. Neuropsychologia 2022; 169:108184. [DOI: 10.1016/j.neuropsychologia.2022.108184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 12/07/2021] [Accepted: 02/15/2022] [Indexed: 10/19/2022]
|
49
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
- Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
50
|
Ruthig P, Schönwiesner M. Common principles in the lateralisation of auditory cortex structure and function for vocal communication in primates and rodents. Eur J Neurosci 2022; 55:827-845. [PMID: 34984748 DOI: 10.1111/ejn.15590] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/24/2021] [Indexed: 11/27/2022]
Abstract
This review summarises recent findings on the lateralisation of communicative sound processing in the auditory cortex (AC) of humans, non-human primates, and rodents. Functional imaging in humans has demonstrated a left hemispheric preference for some acoustic features of speech, but it is unclear to which degree this is caused by bottom-up acoustic feature selectivity or top-down modulation from language areas. Although non-human primates show a less pronounced functional lateralisation in AC, the properties of AC fields and behavioral asymmetries are qualitatively similar. Rodent studies demonstrate microstructural circuits that might underlie bottom-up acoustic feature selectivity in both hemispheres. Functionally, the left AC in the mouse appears to be specifically tuned to communication calls, whereas the right AC may have a more 'generalist' role. Rodents also show anatomical AC lateralisation, such as differences in size and connectivity. Several of these functional and anatomical characteristics are also lateralized in human AC. Thus, complex vocal communication processing shares common features among rodents and primates. We argue that a synthesis of results from humans, non-human primates, and rodents is necessary to identify the neural circuitry of vocal communication processing. However, data from different species and methods are often difficult to compare. Recent advances may enable better integration of methods across species. Efforts to standardise data formats and analysis tools would benefit comparative research and enable synergies between psychological and biological research in the area of vocal communication processing.
Collapse
Affiliation(s)
- Philip Ruthig
- Faculty of Life Sciences, Leipzig University, Leipzig, Sachsen.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
| | | |
Collapse
|