1
|
Individual connectivity-based parcellations reflect functional properties of human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576475. [PMID: 38293021 PMCID: PMC10827228 DOI: 10.1101/2024.01.20.576475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Neuroimaging studies of the functional organization of human auditory cortex have focused on group-level analyses to identify tendencies that represent the typical brain. Here, we mapped auditory areas of the human superior temporal cortex (STC) in 30 participants by combining functional network analysis and 1-mm isotropic resolution 7T functional magnetic resonance imaging (fMRI). Two resting-state fMRI sessions, and one or two auditory and audiovisual speech localizer sessions, were collected on 3-4 separate days. We generated a set of functional network-based parcellations from these data. Solutions with 4, 6, and 11 networks were selected for closer examination based on local maxima of Dice and Silhouette values. The resulting parcellation of auditory cortices showed high intraindividual reproducibility both between resting state sessions (Dice coefficient: 69-78%) and between resting state and task sessions (Dice coefficient: 62-73%). This demonstrates that auditory areas in STC can be reliably segmented into functional subareas. The interindividual variability was significantly larger than intraindividual variability (Dice coefficient: 57%-68%, p<0.001), indicating that the parcellations also captured meaningful interindividual variability. The individual-specific parcellations yielded the highest alignment with task response topographies, suggesting that individual variability in parcellations reflects individual variability in auditory function. Connectional homogeneity within networks was also highest for the individual-specific parcellations. Furthermore, the similarity in the functional parcellations was not explainable by the similarity of macroanatomical properties of auditory cortex. Our findings suggest that individual-level parcellations capture meaningful idiosyncrasies in auditory cortex organization.
Collapse
|
2
|
Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
|
3
|
Evidence for Multiscale Multiplexed Representation of Visual Features in EEG. Neural Comput 2024; 36:412-436. [PMID: 38363657 DOI: 10.1162/neco_a_01649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 12/01/2023] [Indexed: 02/18/2024]
Abstract
Distinct neural processes such as sensory and memory processes are often encoded over distinct timescales of neural activations. Animal studies have shown that this multiscale coding strategy is also implemented for individual components of a single process, such as individual features of a multifeature stimulus in sensory coding. However, the generalizability of this encoding strategy to the human brain has remained unclear. We asked if individual features of visual stimuli were encoded over distinct timescales. We applied a multiscale time-resolved decoding method to electroencephalography (EEG) collected from human subjects presented with grating visual stimuli to estimate the timescale of individual stimulus features. We observed that the orientation and color of the stimuli were encoded in shorter timescales, whereas spatial frequency and the contrast of the same stimuli were encoded in longer timescales. The stimulus features appeared in temporally overlapping windows along the trial supporting a multiplexed coding strategy. These results provide evidence for a multiplexed, multiscale coding strategy in the human visual system.
Collapse
|
4
|
Covert cortical processing: a diagnosis in search of a definition. Neurosci Conscious 2024; 2024:niad026. [PMID: 38327828 PMCID: PMC10849751 DOI: 10.1093/nc/niad026] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/22/2023] [Accepted: 12/10/2023] [Indexed: 02/09/2024] Open
Abstract
Historically, clinical evaluation of unresponsive patients following brain injury has relied principally on serial behavioral examination to search for emerging signs of consciousness and track recovery. Advances in neuroimaging and electrophysiologic techniques now enable clinicians to peer into residual brain functions even in the absence of overt behavioral signs. These advances have expanded clinicians' ability to sub-stratify behaviorally unresponsive and seemingly unaware patients following brain injury by querying and classifying covert brain activity made evident through active or passive neuroimaging or electrophysiologic techniques, including functional MRI, electroencephalography (EEG), transcranial magnetic stimulation-EEG, and positron emission tomography. Clinical research has thus reciprocally influenced clinical practice, giving rise to new diagnostic categories including cognitive-motor dissociation (i.e. 'covert consciousness') and covert cortical processing (CCP). While covert consciousness has received extensive attention and study, CCP is relatively less understood. We describe that CCP is an emerging and clinically relevant state of consciousness marked by the presence of intact association cortex responses to environmental stimuli in the absence of behavioral evidence of stimulus processing. CCP is not a monotonic state but rather encapsulates a spectrum of possible association cortex responses from rudimentary to complex and to a range of possible stimuli. In constructing a roadmap for this evolving field, we emphasize that efforts to inform clinicians, philosophers, and researchers of this condition are crucial. Along with strategies to sensitize diagnostic criteria and disorders of consciousness nosology to these vital discoveries, democratizing access to the resources necessary for clinical identification of CCP is an emerging clinical and ethical imperative.
Collapse
|
5
|
Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
|
6
|
Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:638-654. [PMID: 38434255 PMCID: PMC10907028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.
Collapse
|
7
|
Are acoustics enough? Semantic effects on auditory salience in natural scenes. Front Psychol 2023; 14:1276237. [PMID: 38098516 PMCID: PMC10720592 DOI: 10.3389/fpsyg.2023.1276237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/10/2023] [Indexed: 12/17/2023] Open
Abstract
Auditory salience is a fundamental property of a sound that allows it to grab a listener's attention regardless of their attentional state or behavioral goals. While previous research has shed light on acoustic factors influencing auditory salience, the semantic dimensions of this phenomenon have remained relatively unexplored owing both to the complexity of measuring salience in audition as well as limited focus on complex natural scenes. In this study, we examine the relationship between acoustic, contextual, and semantic attributes and their impact on the auditory salience of natural audio scenes using a dichotic listening paradigm. The experiments present acoustic scenes in forward and backward directions; the latter allows to diminish semantic effects, providing a counterpoint to the effects observed in forward scenes. The behavioral data collected from a crowd-sourced platform reveal a striking convergence in temporal salience maps for certain sound events, while marked disparities emerge in others. Our main hypothesis posits that differences in the perceptual salience of events are predominantly driven by semantic and contextual cues, particularly evident in those cases displaying substantial disparities between forward and backward presentations. Conversely, events exhibiting a high degree of alignment can largely be attributed to low-level acoustic attributes. To evaluate this hypothesis, we employ analytical techniques that combine rich low-level mappings from acoustic profiles with high-level embeddings extracted from a deep neural network. This integrated approach captures both acoustic and semantic attributes of acoustic scenes along with their temporal trajectories. The results demonstrate that perceptual salience is a careful interplay between low-level and high-level attributes that shapes which moments stand out in a natural soundscape. Furthermore, our findings underscore the important role of longer-term context as a critical component of auditory salience, enabling us to discern and adapt to temporal regularities within an acoustic scene. The experimental and model-based validation of semantic factors of salience paves the way for a complete understanding of auditory salience. Ultimately, the empirical and computational analyses have implications for developing large-scale models for auditory salience and audio analytics.
Collapse
|
8
|
A sparse code for natural sound context in auditory cortex. CURRENT RESEARCH IN NEUROBIOLOGY 2023; 6:100118. [PMID: 38152461 PMCID: PMC10749876 DOI: 10.1016/j.crneur.2023.100118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/27/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023] Open
Abstract
Accurate sound perception can require integrating information over hundreds of milliseconds or even seconds. Spectro-temporal models of sound coding by single neurons in auditory cortex indicate that the majority of sound-evoked activity can be attributed to stimuli with a few tens of milliseconds. It remains uncertain how the auditory system integrates information about sensory context on a longer timescale. Here we characterized long-lasting contextual effects in auditory cortex (AC) using a diverse set of natural sound stimuli. We measured context effects as the difference in a neuron's response to a single probe sound following two different context sounds. Many AC neurons showed context effects lasting longer than the temporal window of a traditional spectro-temporal receptive field. The duration and magnitude of context effects varied substantially across neurons and stimuli. This diversity of context effects formed a sparse code across the neural population that encoded a wider range of contexts than any constituent neuron. Encoding model analysis indicates that context effects can be explained by activity in the local neural population, suggesting that recurrent local circuits support a long-lasting representation of sensory context in auditory cortex.
Collapse
|
9
|
Is song processing distinct and special in the auditory cortex? Nat Rev Neurosci 2023; 24:711-722. [PMID: 37783820 DOI: 10.1038/s41583-023-00743-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2023] [Indexed: 10/04/2023]
Abstract
Is the singing voice processed distinctively in the human brain? In this Perspective, we discuss what might distinguish song processing from speech processing in light of recent work suggesting that some cortical neuronal populations respond selectively to song and we outline the implications for our understanding of auditory processing. We review the literature regarding the neural and physiological mechanisms of song production and perception and show that this provides evidence for key differences between song and speech processing. We conclude by discussing the significance of the notion that song processing is special in terms of how this might contribute to theories of the neurobiological origins of vocal communication and to our understanding of the neural circuitry underlying sound processing in the human cortex.
Collapse
|
10
|
Mega-scale movie-fields in the mouse visuo-hippocampal network. eLife 2023; 12:RP85069. [PMID: 37910428 PMCID: PMC10619982 DOI: 10.7554/elife.85069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023] Open
Abstract
Natural visual experience involves a continuous series of related images while the subject is immobile. How does the cortico-hippocampal circuit process a visual episode? The hippocampus is crucial for episodic memory, but most rodent single unit studies require spatial exploration or active engagement. Hence, we investigated neural responses to a silent movie (Allen Brain Observatory) in head-fixed mice without any task or locomotion demands, or rewards. Surprisingly, a third (33%, 3379/10263) of hippocampal -dentate gyrus, CA3, CA1 and subiculum- neurons showed movie-selectivity, with elevated firing in specific movie sub-segments, termed movie-fields, similar to the vast majority of thalamo-cortical (LGN, V1, AM-PM) neurons (97%, 6554/6785). Movie-tuning remained intact in immobile or spontaneously running mice. Visual neurons had >5 movie-fields per cell, but only ~2 in hippocampus. The movie-field durations in all brain regions spanned an unprecedented 1000-fold range: from 0.02s to 20s, termed mega-scale coding. Yet, the total duration of all the movie-fields of a cell was comparable across neurons and brain regions. The hippocampal responses thus showed greater continuous-sequence encoding than visual areas, as evidenced by fewer and broader movie-fields than in visual areas. Consistently, repeated presentation of the movie images in a fixed, but scrambled sequence virtually abolished hippocampal but not visual-cortical selectivity. The preference for continuous, compared to scrambled sequence was eight-fold greater in hippocampal than visual areas, further supporting episodic-sequence encoding. Movies could thus provide a unified way to probe neural mechanisms of episodic information processing and memory, even in immobile subjects, across brain regions, and species.
Collapse
|
11
|
Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Commun Biol 2023; 6:751. [PMID: 37468561 PMCID: PMC10356822 DOI: 10.1038/s42003-023-05126-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
Cortical representations supporting many cognitive abilities emerge from underlying circuits comprised of several different cell types. However, cell type-specific contributions to rate and timing-based cortical coding are not well-understood. Here, we investigated the role of parvalbumin neurons in cortical complex scene analysis. Many complex scenes contain sensory stimuli which are highly dynamic in time and compete with stimuli at other spatial locations. Parvalbumin neurons play a fundamental role in balancing excitation and inhibition in cortex and sculpting cortical temporal dynamics; yet their specific role in encoding complex scenes via timing-based coding, and the robustness of temporal representations to spatial competition, has not been investigated. Here, we address these questions in auditory cortex of mice using a cocktail party-like paradigm, integrating electrophysiology, optogenetic manipulations, and a family of spike-distance metrics, to dissect parvalbumin neurons' contributions towards rate and timing-based coding. We find that suppressing parvalbumin neurons degrades cortical discrimination of dynamic sounds in a cocktail party-like setting via changes in rapid temporal modulations in rate and spike timing, and over a wide range of time-scales. Our findings suggest that parvalbumin neurons play a critical role in enhancing cortical temporal coding and reducing cortical noise, thereby improving representations of dynamic stimuli in complex scenes.
Collapse
|
12
|
Hearing as adaptive cascaded envelope interpolation. Commun Biol 2023; 6:671. [PMID: 37355702 PMCID: PMC10290642 DOI: 10.1038/s42003-023-05040-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/12/2023] [Indexed: 06/26/2023] Open
Abstract
The human auditory system is designed to capture and encode sounds from our surroundings and conspecifics. However, the precise mechanisms by which it adaptively extracts the most important spectro-temporal information from sounds are still not fully understood. Previous auditory models have explained sound encoding at the cochlear level using static filter banks, but this vision is incompatible with the nonlinear and adaptive properties of the auditory system. Here we propose an approach that considers the cochlear processes as envelope interpolations inspired by cochlear physiology. It unifies linear and nonlinear adaptive behaviors into a single comprehensive framework that provides a data-driven understanding of auditory coding. It allows simulating a broad range of psychophysical phenomena from virtual pitches and combination tones to consonance and dissonance of harmonic sounds. It further predicts the properties of the cochlear filters such as frequency selectivity. Here we propose a possible link between the parameters of the model and the density of hair cells on the basilar membrane. Cascaded Envelope Interpolation may lead to improvements in sound processing for hearing aids by providing a non-linear, data-driven, way to preprocessing of acoustic signals consistent with peripheral processes.
Collapse
|
13
|
Intrinsic Neural Timescales in the Temporal Lobe Support an Auditory Processing Hierarchy. J Neurosci 2023; 43:3696-3707. [PMID: 37045604 PMCID: PMC10198454 DOI: 10.1523/jneurosci.1941-22.2023] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 02/21/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2023] Open
Abstract
During rest, intrinsic neural dynamics manifest at multiple timescales, which progressively increase along visual and somatosensory hierarchies. Theoretically, intrinsic timescales are thought to facilitate processing of external stimuli at multiple stages. However, direct links between timescales at rest and sensory processing, as well as translation to the auditory system are lacking. Here, we measured intracranial EEG in 11 human patients with epilepsy (4 women), while listening to pure tones. We show that, in the auditory network, intrinsic neural timescales progressively increase, while the spectral exponent flattens, from temporal to entorhinal cortex, hippocampus, and amygdala. Within the neocortex, intrinsic timescales exhibit spatial gradients that follow the temporal lobe anatomy. Crucially, intrinsic timescales at baseline can explain the latency of auditory responses: as intrinsic timescales increase, so do the single-electrode response onset and peak latencies. Our results suggest that the human auditory network exhibits a repertoire of intrinsic neural dynamics, which manifest in cortical gradients with millimeter resolution and may provide a variety of temporal windows to support auditory processing.SIGNIFICANCE STATEMENT Endogenous neural dynamics are often characterized by their intrinsic timescales. These are thought to facilitate processing of external stimuli. However, a direct link between intrinsic timing at rest and sensory processing is missing. Here, with intracranial EEG, we show that intrinsic timescales progressively increase from temporal to entorhinal cortex, hippocampus, and amygdala. Intrinsic timescales at baseline can explain the variability in the timing of intracranial EEG responses to sounds: cortical electrodes with fast timescales also show fast- and short-lasting responses to auditory stimuli, which progressively increase in the hippocampus and amygdala. Our results suggest that a hierarchy of neural dynamics in the temporal lobe manifests across cortical and limbic structures and can explain the temporal richness of auditory responses.
Collapse
|
14
|
Intra- and inter-hemispheric network dynamics supporting object recognition and speech production. Neuroimage 2023; 270:119954. [PMID: 36828156 PMCID: PMC10112006 DOI: 10.1016/j.neuroimage.2023.119954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/14/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
We built normative brain atlases that animate millisecond-scale intra- and inter-hemispheric white matter-level connectivity dynamics supporting object recognition and speech production. We quantified electrocorticographic modulations during three naming tasks using event-related high-gamma activity from 1,114 nonepileptogenic intracranial electrodes (i.e., non-lesional areas unaffected by epileptiform discharges). Using this electrocorticography data, we visualized functional connectivity modulations defined as significant naming-related high-gamma modulations occurring simultaneously at two sites connected by direct white matter streamlines on diffusion-weighted imaging tractography. Immediately after stimulus onset, intra- and inter-hemispheric functional connectivity enhancements were confined mainly across modality-specific perceptual regions. During response preparation, left intra-hemispheric connectivity enhancements propagated in a posterior-to-anterior direction, involving the left precentral and prefrontal areas. After overt response onset, inter- and intra-hemispheric connectivity enhancements mainly encompassed precentral, postcentral, and superior-temporal (STG) gyri. We found task-specific connectivity enhancements during response preparation as follows. Picture naming enhanced activity along the left arcuate fasciculus between the inferior-temporal and precentral/posterior inferior-frontal (pIFG) gyri. Nonspeech environmental sound naming augmented functional connectivity via the left inferior longitudinal and fronto-occipital fasciculi between the medial-occipital and STG/pIFG. Auditory descriptive naming task enhanced usage of the left frontal U-fibers, involving the middle-frontal gyrus. Taken together, the commonly observed network enhancements include inter-hemispheric connectivity optimizing perceptual processing exerted in each hemisphere, left intra-hemispheric connectivity supporting semantic and lexical processing, and inter-hemispheric connectivity for symmetric oral movements during overt speech. Our atlases improve the currently available models of object recognition and speech production by adding neural dynamics via direct intra- and inter-hemispheric white matter tracts.
Collapse
|
15
|
Functional network properties of the auditory cortex. Hear Res 2023; 433:108768. [PMID: 37075536 DOI: 10.1016/j.heares.2023.108768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 03/27/2023] [Accepted: 04/11/2023] [Indexed: 04/21/2023]
Abstract
The auditory system transforms auditory stimuli from the external environment into perceptual auditory objects. Recent studies have focused on the contribution of the auditory cortex to this transformation. Other studies have yielded important insights into the contributions of neural activity in the auditory cortex to cognition and decision-making. However, despite this important work, the relationship between auditory-cortex activity and behavior/perception has not been fully elucidated. Two of the more important gaps in our understanding are (1) the specific and differential contributions of different fields of the auditory cortex to auditory perception and behavior and (2) the way networks of auditory neurons impact and facilitate auditory information processing. Here, we focus on recent work from non-human-primate models of hearing and review work related to these gaps and put forth challenges to further our understanding of how single-unit activity and network activity in different cortical fields contribution to behavior and perception.
Collapse
|
16
|
The importance of temporal-fine structure to perceive time-compressed speech with and without the restoration of the syllabic rhythm. Sci Rep 2023; 13:2874. [PMID: 36806145 PMCID: PMC9938863 DOI: 10.1038/s41598-023-29755-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 02/09/2023] [Indexed: 02/20/2023] Open
Abstract
Intelligibility of time-compressed (TC) speech decreases with increasing speech rate. However, intelligibility can be restored by 'repackaging' the TC speech by inserting silences between the syllables so that the original 'rhythm' is restored. Although restoration of the speech rhythm affects solely the temporal envelope, it is unclear to which extent repackaging also affects the perception of the temporal-fine structure (TFS). Here we investigate to which extent TFS contributes to the perception of TC and repackaged TC speech in quiet. Intelligibility of TC sentences with a speech rate of 15.6 syllables per second (sps) and the repackaged sentences, by adding 100 ms of silence between the syllables of the TC speech (i.e., a speech rate of 6.1 sps), was assessed for three TFS conditions: the original TFS and the TFS conveyed by an 8- and 16-channel noise vocoder. An overall positive effect on intelligibility of both the repackaging process and of the amount of TFS available to the listener was observed. Furthermore, the benefit associated with the repackaging TC speech depended on the amount of TFS available. The results show TFS contributes significantly to the perception of fast speech even when the overall rhythm/envelope of TC speech is restored.
Collapse
|
17
|
Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex. Neuroimage 2023; 266:119819. [PMID: 36529203 PMCID: PMC10510744 DOI: 10.1016/j.neuroimage.2022.119819] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/28/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.
Collapse
|
18
|
Temporal hierarchies in the predictive processing of melody - From pure tones to songs. Neurosci Biobehav Rev 2023; 145:105007. [PMID: 36535375 DOI: 10.1016/j.neubiorev.2022.105007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/30/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022]
Abstract
Listening to musical melodies is a complex task that engages perceptual and memoryrelated processes. The processes underlying melody cognition happen simultaneously on different timescales, ranging from milliseconds to minutes. Although attempts have been made, research on melody perception is yet to produce a unified framework of how melody processing is achieved in the brain. This may in part be due to the difficulty of integrating concepts such as perception, attention and memory, which pertain to different temporal scales. Recent theories on brain processing, which hold prediction as a fundamental principle, offer potential solutions to this problem and may provide a unifying framework for explaining the neural processes that enable melody perception on multiple temporal levels. In this article, we review empirical evidence for predictive coding on the levels of pitch formation, basic pitch-related auditory patterns,more complex regularity processing extracted from basic patterns and long-term expectations related to musical syntax. We also identify areas that would benefit from further inquiry and suggest future directions in research on musical melody perception.
Collapse
|
19
|
Systematic errors in the perception of rhythm. Front Hum Neurosci 2022; 16:1009219. [DOI: 10.3389/fnhum.2022.1009219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open
Abstract
One hypothesis for why humans enjoy musical rhythms relates to their prediction of when each beat should occur. The ability to predict the timing of an event is important from an evolutionary perspective. Therefore, our brains have evolved internal mechanisms for processing the progression of time. However, due to inherent noise in neural signals, this prediction is not always accurate. Theoretical considerations of optimal estimates suggest the occurrence of certain systematic errors made by the brain when estimating the timing of beats in rhythms. Here, we tested psychophysically whether these systematic errors exist and if so, how they depend on stimulus parameters. Our experimental data revealed two main types of systematic errors. First, observers perceived the time of the last beat of a rhythmic pattern as happening earlier than actual when the inter-beat interval was short. Second, the perceived time of the last beat was later than the actual when the inter-beat interval was long. The magnitude of these systematic errors fell as the number of beats increased. However, with many beats, the errors due to long inter-beat intervals became more apparent. We propose a Bayesian model for these systematic errors. The model fits these data well, allowing us to offer possible explanations for how these errors occurred. For instance, neural processes possibly contributing to the errors include noisy and temporally asymmetric impulse responses, priors preferring certain time intervals, and better-early-than-late loss functions. We finish this article with brief discussions of both the implications of systematic errors for the appreciation of rhythm and the possible compensation by the brain’s motor system during a musical performance.
Collapse
|
20
|
The scope and potential of music therapy in stroke rehabilitation. JOURNAL OF INTEGRATIVE MEDICINE 2022; 20:284-287. [PMID: 35534380 DOI: 10.1016/j.joim.2022.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/24/2022] [Indexed: 06/14/2023]
Abstract
There is a growing interest in the use of music therapy in neurological rehabilitation. Of all the major neurological illnesses, stroke rehabilitation has been observed to have some of the strongest potential for music therapy's beneficial effect. The current burden of stroke has raised the need to embrace novel, cost-effective, rehabilitation designs that will enhance the existing physical, occupation, and speech therapies. Music therapy addresses a broad spectrum of motor, speech, and cognitive deficits, as well as behavioral and emotional issues. Several music therapy designs have focused on gait, cognitive, and speech rehabilitation, but most of the existing randomized controlled trials based on these interventions have a high risk of bias and are statistically insignificant. More randomized controlled trials with greater number of participants are required to strengthen the current data. Fostering an open and informed dialogue between patients, healthcare providers, and music therapists may help increase quality of life, dispel fallacies, and guide patients to specific musical interventions.
Collapse
|
21
|
A neural population selective for song in human auditory cortex. Curr Biol 2022; 32:1470-1484.e12. [PMID: 35196507 PMCID: PMC9092957 DOI: 10.1016/j.cub.2022.01.069] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 10/26/2021] [Accepted: 01/24/2022] [Indexed: 12/18/2022]
Abstract
How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.
Collapse
|
22
|
What auditory cortex is waiting for. Nat Hum Behav 2022; 6:324-325. [PMID: 35145279 DOI: 10.1038/s41562-021-01262-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
23
|
Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2021; 34:24455-24467. [PMID: 38737583 PMCID: PMC11087060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
Natural signals such as speech are hierarchically structured across many different timescales, spanning tens (e.g., phonemes) to hundreds (e.g., words) of milliseconds, each of which is highly variable and context-dependent. While deep neural networks (DNNs) excel at recognizing complex patterns from natural signals, relatively little is known about how DNNs flexibly integrate across multiple timescales. Here, we show how a recently developed method for studying temporal integration in biological neural systems - the temporal context invariance (TCI) paradigm - can be used to understand temporal integration in DNNs. The method is simple: we measure responses to a large number of stimulus segments presented in two different contexts and estimate the smallest segment duration needed to achieve a context invariant response. We applied our method to understand how the popular DeepSpeech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network. Moreover, by measuring integration windows for time-stretched/compressed speech, we reveal a transition point, midway through the trained network, where integration windows become yoked to the duration of stimulus structures (e.g., phonemes or words) rather than absolute time. Similar phenomena were observed in a purely recurrent and purely convolutional network although structure-yoked integration was more prominent in the recurrent network. These findings suggest that deep speech recognition systems use a common motif to encode the hierarchical structure of speech: integrating across short, time-yoked windows at early layers and long, structure-yoked windows at later layers. Our method provides a straightforward and general-purpose toolkit for understanding temporal integration in black-box machine learning models.
Collapse
|