1
|
Concurrent Encoding of Sequence Predictability and Event-Evoked Prediction Error in Unfolding Auditory Patterns. J Neurosci 2024; 44:e1894232024. [PMID: 38350998 PMCID: PMC10993036 DOI: 10.1523/jneurosci.1894-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/02/2024] [Accepted: 02/06/2024] [Indexed: 03/26/2024] Open
Abstract
Human listeners possess an innate capacity to discern patterns within rapidly unfolding sensory input. Core questions, guiding ongoing research, focus on the mechanisms through which these representations are acquired and whether the brain prioritizes or suppresses predictable sensory signals. Previous work, using fast auditory sequences (tone-pips presented at a rate of 20 Hz), revealed sustained response effects that appear to track the dynamic predictability of the sequence. Here, we extend the investigation to slower sequences (4 Hz), permitting the isolation of responses to individual tones. Stimuli were 50 ms tone-pips, ordered into random (RND) and regular (REG; a repeating pattern of 10 frequencies) sequences; Two timing profiles were created: in "fast" sequences, tone-pips were presented in direct succession (20 Hz); in "slow" sequences, tone-pips were separated by a 200 ms silent gap (4 Hz). Naive participants (N = 22; both sexes) passively listened to these sequences, while brain responses were recorded using magnetoencephalography (MEG). Results unveiled a heightened magnitude of sustained brain responses in REG when compared to RND patterns. This manifested from three tones after the onset of the pattern repetition, even in the context of slower sequences characterized by extended pattern durations (2,500 ms). This observation underscores the remarkable implicit sensitivity of the auditory brain to acoustic regularities. Importantly, brain responses evoked by single tones exhibited the opposite pattern-stronger responses to tones in RND than REG sequences. The demonstration of simultaneous but opposing sustained and evoked response effects reveals concurrent processes that shape the representation of unfolding auditory patterns.
Collapse
|
2
|
Acute Aromatase Inhibition Impairs Neural and Behavioral Auditory Scene Analysis in Zebra Finches. eNeuro 2024; 11:ENEURO.0423-23.2024. [PMID: 38467426 PMCID: PMC10960633 DOI: 10.1523/eneuro.0423-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/31/2023] [Accepted: 01/04/2024] [Indexed: 03/13/2024] Open
Abstract
Auditory perception can be significantly disrupted by noise. To discriminate sounds from noise, auditory scene analysis (ASA) extracts the functionally relevant sounds from acoustic input. The zebra finch communicates in noisy environments. Neurons in their secondary auditory pallial cortex (caudomedial nidopallium, NCM) can encode song from background chorus, or scenes, and this capacity may aid behavioral ASA. Furthermore, song processing is modulated by the rapid synthesis of neuroestrogens when hearing conspecific song. To examine whether neuroestrogens support neural and behavioral ASA in both sexes, we retrodialyzed fadrozole (aromatase inhibitor, FAD) and recorded in vivo awake extracellular NCM responses to songs and scenes. We found that FAD affected neural encoding of songs by decreasing responsiveness and timing reliability in inhibitory (narrow-spiking), but not in excitatory (broad-spiking) neurons. Congruently, FAD decreased neural encoding of songs in scenes for both cell types, particularly in females. Behaviorally, we trained birds using operant conditioning and tested their ability to detect songs in scenes after administering FAD orally or injected bilaterally into NCM. Oral FAD increased response bias and decreased correct rejections in females, but not in males. FAD in NCM did not affect performance. Thus, FAD in the NCM impaired neuronal ASA but that did not lead to behavioral disruption suggesting the existence of resilience or compensatory responses. Moreover, impaired performance after systemic FAD suggests involvement of other aromatase-rich networks outside the auditory pathway in ASA. This work highlights how transient estrogen synthesis disruption can modulate higher-order processing in an animal model of vocal communication.
Collapse
|
3
|
Editorial: Understanding the role of head and body movement when navigating a complex auditory scene. Front Neurosci 2023; 17:1340393. [PMID: 38178835 PMCID: PMC10765534 DOI: 10.3389/fnins.2023.1340393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/01/2023] [Indexed: 01/06/2024] Open
|
4
|
Editorial: Listening with two ears - new insights and perspectives in binaural research. Front Neurosci 2023; 17:1323330. [PMID: 38027486 PMCID: PMC10660270 DOI: 10.3389/fnins.2023.1323330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 10/27/2023] [Indexed: 12/01/2023] Open
|
5
|
Neural dynamics underlying successful auditory short-term memory performance. Eur J Neurosci 2023; 58:3859-3878. [PMID: 37691137 PMCID: PMC10946728 DOI: 10.1111/ejn.16140] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/18/2023] [Accepted: 08/19/2023] [Indexed: 09/12/2023]
Abstract
Listeners often operate in complex acoustic environments, consisting of many concurrent sounds. Accurately encoding and maintaining such auditory objects in short-term memory is crucial for communication and scene analysis. Yet, the neural underpinnings of successful auditory short-term memory (ASTM) performance are currently not well understood. To elucidate this issue, we presented a novel, challenging auditory delayed match-to-sample task while recording MEG. Human participants listened to 'scenes' comprising three concurrent tone pip streams. The task was to indicate, after a delay, whether a probe stream was present in the just-heard scene. We present three key findings: First, behavioural performance revealed faster responses in correct versus incorrect trials as well as in 'probe present' versus 'probe absent' trials, consistent with ASTM search. Second, successful compared with unsuccessful ASTM performance was associated with a significant enhancement of event-related fields and oscillatory activity in the theta, alpha and beta frequency ranges. This extends previous findings of an overall increase of persistent activity during short-term memory performance. Third, using distributed source modelling, we found these effects to be confined mostly to sensory areas during encoding, presumably related to ASTM contents per se. Parietal and frontal sources then became relevant during the maintenance stage, indicating that effective STM operation also relies on ongoing inhibitory processes suppressing task-irrelevant information. In summary, our results deliver a detailed account of the neural patterns that differentiate successful from unsuccessful ASTM performance in the context of a complex, multi-object auditory scene.
Collapse
|
6
|
Neuroimaging evidence for the direct role of auditory scene analysis in object perception. Cereb Cortex 2023; 33:6257-6272. [PMID: 36562994 PMCID: PMC10183742 DOI: 10.1093/cercor/bhac501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
Auditory Scene Analysis (ASA) refers to the grouping of acoustic signals into auditory objects. Previously, we have shown that perceived musicality of auditory sequences varies with high-level organizational features. Here, we explore the neural mechanisms mediating ASA and auditory object perception. Participants performed musicality judgments on randomly generated pure-tone sequences and manipulated versions of each sequence containing low-level changes (amplitude; timbre). Low-level manipulations affected auditory object perception as evidenced by changes in musicality ratings. fMRI was used to measure neural activation to sequences rated most and least musical, and the altered versions of each sequence. Next, we generated two partially overlapping networks: (i) a music processing network (music localizer) and (ii) an ASA network (base sequences vs. ASA manipulated sequences). Using Representational Similarity Analysis, we correlated the functional profiles of each ROI to a model generated from behavioral musicality ratings as well as models corresponding to low-level feature processing and music perception. Within overlapping regions, areas near primary auditory cortex correlated with low-level ASA models, whereas right IPS was correlated with musicality ratings. Shared neural mechanisms that correlate with behavior and underlie both ASA and music perception suggests that low-level features of auditory stimuli play a role in auditory object perception.
Collapse
|
7
|
Effect of Masker Head Orientation, Listener Age, and Extended High-Frequency Sensitivity on Speech Recognition in Spatially Separated Speech. Ear Hear 2022; 43:90-100. [PMID: 34260434 PMCID: PMC8712343 DOI: 10.1097/aud.0000000000001081] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVES Masked speech recognition is typically assessed as though the target and background talkers are all directly facing the listener. However, background speech in natural environments is often produced by talkers facing other directions, and talker head orientation affects the spectral content of speech, particularly at the extended high frequencies (EHFs; >8 kHz). This study investigated the effect of masker head orientation and listeners' EHF sensitivity on speech-in-speech recognition and spatial release from masking in children and adults. DESIGN Participants were 5- to 7-year-olds (n = 15) and adults (n = 34), all with normal hearing up to 8 kHz and a range of EHF hearing thresholds. Speech reception thresholds (SRTs) were measured for target sentences recorded from a microphone directly in front of the talker's mouth and presented from a loudspeaker directly in front of the listener, simulating a target directly in front of and facing the listener. The maskers were two streams of concatenated words recorded from a microphone located at either 0° or 60° azimuth, simulating masker talkers facing the listener or facing away from the listener, respectively. Maskers were presented in one of three spatial conditions: co-located with the target, symmetrically separated on either side of the target (+54° and -54° on the horizontal plane), or asymmetrically separated to the right of the target (both +54° on the horizontal plane). RESULTS Performance was poorer for the facing than for the nonfacing masker head orientation. This benefit of the nonfacing masker head orientation, or head orientation release from masking (HORM), was largest under the co-located condition, but it was also observed for the symmetric and asymmetric masker spatial separation conditions. SRTs were positively correlated with the mean 16-kHz threshold across ears in adults for the nonfacing conditions but not for the facing masker conditions. In adults with normal EHF thresholds, the HORM was comparable in magnitude to the benefit of a symmetric spatial separation of the target and maskers. Although children benefited from the nonfacing masker head orientation, their HORM was reduced compared to adults with normal EHF thresholds. Spatial release from masking was comparable across age groups for symmetric masker placement, but it was larger in adults than children for the asymmetric masker. CONCLUSIONS Masker head orientation affects speech-in-speech recognition in children and adults, particularly those with normal EHF thresholds. This is important because masker talkers do not all face the listener under most natural listening conditions, and assuming a midline orientation would tend to overestimate the effect of spatial separation. The benefits associated with EHF audibility for speech-in-speech recognition may warrant clinical evaluation of thresholds above 8 kHz.
Collapse
|
8
|
Predictability-Based Source Segregation and Sensory Deviance Detection in Auditory Aging. Front Hum Neurosci 2021; 15:734231. [PMID: 34776906 PMCID: PMC8586071 DOI: 10.3389/fnhum.2021.734231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 10/08/2021] [Indexed: 11/30/2022] Open
Abstract
When multiple sound sources are present at the same time, auditory perception is often challenged with disentangling the resulting mixture and focusing attention on the target source. It has been repeatedly demonstrated that background (distractor) sound sources are easier to ignore when their spectrotemporal signature is predictable. Prior evidence suggests that this ability to exploit predictability for foreground-background segregation degrades with age. On a theoretical level, this has been related with an impairment in elderly adults’ capabilities to detect certain types of sensory deviance in unattended sound sequences. Yet the link between those two capacities, deviance detection and predictability-based sound source segregation, has not been empirically demonstrated. Here we report on a combined behavioral-EEG study investigating the ability of elderly listeners (60–75 years of age) to use predictability as a cue for sound source segregation, as well as their sensory deviance detection capacities. Listeners performed a detection task on a target stream that can only be solved when a concurrent distractor stream is successfully ignored. We contrast two conditions whose distractor streams differ in their predictability. The ability to benefit from predictability was operationalized as performance difference between the two conditions. Results show that elderly listeners can use predictability for sound source segregation at group level, yet with a high degree of inter-individual variation in this ability. In a further, passive-listening control condition, we measured correlates of deviance detection in the event-related brain potential (ERP) elicited by occasional deviations from the same spectrotemporal pattern as used for the predictable distractor sequence during the behavioral task. ERP results confirmed neural signatures of deviance detection in terms of mismatch negativity (MMN) at group level. Correlation analyses at single-subject level provide no evidence for the hypothesis that deviance detection ability (measured by MMN amplitude) is related to the ability to benefit from predictability for sound source segregation. These results are discussed in the frameworks of sensory deviance detection and predictive coding.
Collapse
|
9
|
Resetting of Auditory and Visual Segregation Occurs After Transient Stimuli of the Same Modality. Front Psychol 2021; 12:720131. [PMID: 34621219 PMCID: PMC8490814 DOI: 10.3389/fpsyg.2021.720131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 08/16/2021] [Indexed: 12/03/2022] Open
Abstract
In the presence of a continually changing sensory environment, maintaining stable but flexible awareness is paramount, and requires continual organization of information. Determining which stimulus features belong together, and which are separate is therefore one of the primary tasks of the sensory systems. Unknown is whether there is a global or sensory-specific mechanism that regulates the final perceptual outcome of this streaming process. To test the extent of modality independence in perceptual control, an auditory streaming experiment, and a visual moving-plaid experiment were performed. Both were designed to evoke alternating perception of an integrated or segregated percept. In both experiments, transient auditory and visual distractor stimuli were presented in separate blocks, such that the distractors did not overlap in frequency or space with the streaming or plaid stimuli, respectively, thus preventing peripheral interference. When a distractor was presented in the opposite modality as the bistable stimulus (visual distractors during auditory streaming or auditory distractors during visual streaming), the probability of percept switching was not significantly different than when no distractor was presented. Conversely, significant differences in switch probability were observed following within-modality distractors, but only when the pre-distractor percept was segregated. Due to the modality-specificity of the distractor-induced resetting, the results suggest that conscious perception is at least partially controlled by modality-specific processing. The fact that the distractors did not have peripheral overlap with the bistable stimuli indicates that the perceptual reset is due to interference at a locus in which stimuli of different frequencies and spatial locations are integrated.
Collapse
|
10
|
Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2021; 15:635937. [PMID: 34630007 PMCID: PMC8498193 DOI: 10.3389/fnins.2021.635937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Collapse
|
11
|
Source identity shapes spatial preference in primary auditory cortex during active navigation. Curr Biol 2021; 31:3875-3883.e5. [PMID: 34192513 DOI: 10.1016/j.cub.2021.06.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/10/2021] [Accepted: 06/09/2021] [Indexed: 01/05/2023]
Abstract
Information about the position of sensory objects and identifying their concurrent behavioral relevance is vital to navigate the environment. In the auditory system, spatial information is computed in the brain based on the position of the sound source relative to the observer and thus assumed to be egocentric throughout the auditory pathway. This assumption is largely based on studies conducted in either anesthetized or head-fixed and passively listening animals, thus lacking self-motion and selective listening. Yet these factors are fundamental components of natural sensing1 that may crucially impact the nature of spatial coding and sensory object representation.2 How individual objects are neuronally represented during unrestricted self-motion and active sensing remains mostly unexplored. Here, we trained gerbils on a behavioral foraging paradigm that required localization and identification of sound sources during free navigation. Chronic tetrode recordings in primary auditory cortex during task performance revealed previously unreported sensory object representations. Strikingly, the egocentric angle preference of the majority of spatially sensitive neurons changed significantly depending on the task-specific identity (outcome association) of the sound source. Spatial tuning also exhibited large temporal complexity. Moreover, we encountered egocentrically untuned neurons whose response magnitude differed between source identities. Using a neural network decoder, we show that, together, these neuronal response ensembles provide spatiotemporally co-existent information about both the egocentric location and the identity of individual sensory objects during self-motion, revealing a novel cortical computation principle for naturalistic sensing.
Collapse
|
12
|
Sustained Pupil Responses Are Modulated by Predictability of Auditory Sequences. J Neurosci 2021; 41:6116-6127. [PMID: 34083259 PMCID: PMC8276747 DOI: 10.1523/jneurosci.2879-20.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 05/05/2021] [Accepted: 05/08/2021] [Indexed: 11/21/2022] Open
Abstract
The brain is highly sensitive to auditory regularities and exploits the predictable order of sounds in many situations, from parsing complex auditory scenes, to the acquisition of language. To understand the impact of stimulus predictability on perception, it is important to determine how the detection of predictable structure influences processing and attention. Here, we use pupillometry to gain insight into the effect of sensory regularity on arousal. Pupillometry is a commonly used measure of salience and processing effort, with more perceptually salient or perceptually demanding stimuli consistently associated with larger pupil diameters. In two experiments we tracked human listeners' pupil dynamics while they listened to sequences of 50-ms tone pips of different frequencies. The order of the tone pips was either random, contained deterministic (fully predictable) regularities (experiment 1, n = 18, 11 female) or had a probabilistic regularity structure (experiment 2, n = 20, 17 female). The sequences were rapid, preventing conscious tracking of sequence structure thus allowing us to focus on the automatic extraction of different types of regularities. We hypothesized that if regularity facilitates processing by reducing processing demands, a smaller pupil diameter would be seen in response to regular relative to random patterns. Conversely, if regularity is associated with heightened arousal and attention (i.e., engages processing resources) the opposite pattern would be expected. In both experiments we observed a smaller sustained (tonic) pupil diameter for regular compared with random sequences, consistent with the former hypothesis and confirming that predictability facilitates sequence processing.SIGNIFICANCE STATEMENT The brain is highly sensitive to auditory regularities. To appreciate the impact that the presence of predictability has on perception, we need to better understand how a predictable structure influences processing and attention. We recorded listeners' pupil responses to sequences of tones that followed either a predictable or unpredictable pattern, as the pupil can be used to implicitly tap into these different cognitive processes. We found that the pupil showed a smaller sustained diameter to predictable sequences, indicating that predictability eased processing rather than boosted attention. The findings suggest that the pupil response can be used to study the automatic extraction of regularities, and that the effects are most consistent with predictability helping the listener to efficiently process upcoming sounds.
Collapse
|
13
|
Dynamics of the Auditory Continuity Illusion. Front Comput Neurosci 2021; 15:676637. [PMID: 34168547 PMCID: PMC8217826 DOI: 10.3389/fncom.2021.676637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/04/2021] [Indexed: 11/13/2022] Open
Abstract
Illusions give intriguing insights into perceptual and neural dynamics. In the auditory continuity illusion, two brief tones separated by a silent gap may be heard as one continuous tone if a noise burst with appropriate characteristics fills the gap. This illusion probes the conditions under which listeners link related sounds across time and maintain perceptual continuity in the face of sudden changes in sound mixtures. Conceptual explanations of this illusion have been proposed, but its neural basis is still being investigated. In this work we provide a dynamical systems framework, grounded in principles of neural dynamics, to explain the continuity illusion. We construct an idealized firing rate model of a neural population and analyze the conditions under which firing rate responses persist during the interruption between the two tones. First, we show that sustained inputs and hysteresis dynamics (a mismatch between tone levels needed to activate and inactivate the population) can produce continuous responses. Second, we show that transient inputs and bistable dynamics (coexistence of two stable firing rate levels) can also produce continuous responses. Finally, we combine these input types together to obtain neural dynamics consistent with two requirements for the continuity illusion as articulated in a well-known theory of auditory scene analysis: responses persist through the noise-filled gap if noise provides sufficient evidence that the tone continues and if there is no evidence of discontinuities between the tones and noise. By grounding these notions in a quantitative model that incorporates elements of neural circuits (recurrent excitation, and mutual inhibition, specifically), we identify plausible mechanisms for the continuity illusion. Our findings can help guide future studies of neural correlates of this illusion and inform development of more biophysically-based models of the auditory continuity illusion.
Collapse
|
14
|
Rapid Assessment of Non-Verbal Auditory Perception in Normal-Hearing Participants and Cochlear Implant Users. J Clin Med 2021; 10:jcm10102093. [PMID: 34068067 PMCID: PMC8152499 DOI: 10.3390/jcm10102093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/26/2021] [Accepted: 05/06/2021] [Indexed: 01/17/2023] Open
Abstract
In the case of hearing loss, cochlear implants (CI) allow for the restoration of hearing. Despite the advantages of CIs for speech perception, CI users still complain about their poor perception of their auditory environment. Aiming to assess non-verbal auditory perception in CI users, we developed five listening tests. These tests measure pitch change detection, pitch direction identification, pitch short-term memory, auditory stream segregation, and emotional prosody recognition, along with perceived intensity ratings. In order to test the potential benefit of visual cues for pitch processing, the three pitch tests included half of the trials with visual indications to perform the task. We tested 10 normal-hearing (NH) participants with material being presented as original and vocoded sounds, and 10 post-lingually deaf CI users. With the vocoded sounds, the NH participants had reduced scores for the detection of small pitch differences, and reduced emotion recognition and streaming abilities compared to the original sounds. Similarly, the CI users had deficits for small differences in the pitch change detection task and emotion recognition, as well as a decreased streaming capacity. Overall, this assessment allows for the rapid detection of specific patterns of non-verbal auditory perception deficits. The current findings also open new perspectives about how to enhance pitch perception capacities using visual cues.
Collapse
|
15
|
Tracking Musical Voices in Bach's The Art of the Fugue: Timbral Heterogeneity Differentially Affects Younger Normal-Hearing Listeners and Older Hearing-Aid Users. Front Psychol 2021; 12:608684. [PMID: 33935864 PMCID: PMC8079728 DOI: 10.3389/fpsyg.2021.608684] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/15/2021] [Indexed: 12/03/2022] Open
Abstract
Auditory scene analysis is an elementary aspect of music perception, yet only little research has scrutinized auditory scene analysis under realistic musical conditions with diverse samples of listeners. This study probed the ability of younger normal-hearing listeners and older hearing-aid users in tracking individual musical voices or lines in JS Bach's The Art of the Fugue. Five-second excerpts with homogeneous or heterogenous instrumentation of 2–4 musical voices were presented from spatially separated loudspeakers and preceded by a short cue for signaling the target voice. Listeners tracked the cued voice and detected whether an amplitude modulation was imposed on the cued voice or a distractor voice. Results indicated superior performance of young normal-hearing listeners compared to older hearing-aid users. Performance was generally better in conditions with fewer voices. For young normal-hearing listeners, there was interaction between the number of voices and the instrumentation: performance degraded less drastically with an increase in the number of voices for timbrally heterogeneous mixtures compared to homogeneous mixtures. Older hearing-aid users generally showed smaller effects of the number of voices and instrumentation, but no interaction between the two factors. Moreover, tracking performance of older hearing aid users did not differ when these participants did or did not wear hearing aids. These results shed light on the role of timbral differentiation in musical scene analysis and suggest reduced musical scene analysis abilities of older hearing-impaired listeners in a realistic musical scenario.
Collapse
|
16
|
Multi-Voiced Music Bypasses Attentional Limitations in the Brain. Front Neurosci 2021; 15:588914. [PMID: 33584187 PMCID: PMC7877539 DOI: 10.3389/fnins.2021.588914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 01/06/2021] [Indexed: 11/25/2022] Open
Abstract
Attentional limits make it difficult to comprehend concurrent speech streams. However, multiple musical streams are processed comparatively easily. Coherence may be a key difference between music and stimuli like speech, which does not rely on the integration of multiple streams for comprehension. The musical organization between melodies in a composition may provide a cognitive scaffold to overcome attentional limitations when perceiving multiple lines of music concurrently. We investigated how listeners attend to multi–voiced music, examining biological indices associated with processing structured versus unstructured music. We predicted that musical structure provides coherence across distinct musical lines, allowing listeners to attend to simultaneous melodies, and that a lack of organization causes simultaneous melodies to be heard as separate streams. Musician participants attended to melodies in a Coherent music condition featuring flute duets and a Jumbled condition where those duets were manipulated to eliminate coherence between the parts. Auditory–evoked cortical potentials were collected to a tone probe. Analysis focused on the N100 response which is primarily generated within the auditory cortex and is larger for attended versus ignored stimuli. Results suggest that participants did not attend to one line over the other when listening to Coherent music, instead perceptually integrating the streams. Yet, for the Jumbled music, effects indicate that participants attended to one line while ignoring the other, abandoning their integration. Our findings lend support for the theory that musical organization aids attention when perceiving multi–voiced music.
Collapse
|
17
|
Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
|
18
|
A Comparison of Environment Classification Among Premium Hearing Instruments. Trends Hear 2021; 25:2331216520980968. [PMID: 33749410 PMCID: PMC7989119 DOI: 10.1177/2331216520980968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 11/20/2020] [Accepted: 11/24/2020] [Indexed: 11/21/2022] Open
Abstract
Hearing aids classify acoustic environments into multiple, generic classes for the purposes of guiding signal processing. Information about environmental classification is made available to the clinician for fitting, counseling, and troubleshooting purposes. The goal of this study was to better inform scientists and clinicians about the nature of that information by comparing the classification schemes among five premium hearing instruments in a wide range of acoustic scenes including those that vary in signal-to-noise ratio and overall level (dB SPL). Twenty-eight acoustic scenes representing various prototypical environments were presented to five premium devices mounted on an acoustic manikin. Classification measures were recorded from the brand-specific fitting software then recategorized to generic labels to conceal the device company, including (a) Speech in Quiet, (b) Speech in Noise, (c) Noise, and (d) Music. Twelve normal-hearing listeners also classified each scene. The results revealed a variety of similarities and differences among the five devices and the human subjects. Where some devices were highly dependent on input overall level, others were influenced markedly by signal-to-noise ratio. Differences between human and hearing aid classification were evident for several speech and music scenes. Environmental classification is the heart of the signal processing strategy for any given device, providing key input to subsequent decision-making. Comprehensive assessment of environmental classification is essential when considering the cost of signal processing errors, the potential impact for typical wearers, and the information that is available for use by clinicians. The magnitude of differences among devices is remarkable and to be noted.
Collapse
|
19
|
Difficulties with Speech-in-Noise Perception Related to Fundamental Grouping Processes in Auditory Cortex. Cereb Cortex 2020; 31:1582-1596. [PMID: 33136138 PMCID: PMC7869094 DOI: 10.1093/cercor/bhaa311] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/04/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open
Abstract
In our everyday lives, we are often required to follow a conversation when background noise is present (“speech-in-noise” [SPIN] perception). SPIN perception varies widely—and people who are worse at SPIN perception are also worse at fundamental auditory grouping, as assessed by figure-ground tasks. Here, we examined the cortical processes that link difficulties with SPIN perception to difficulties with figure-ground perception using functional magnetic resonance imaging. We found strong evidence that the earliest stages of the auditory cortical hierarchy (left core and belt areas) are similarly disinhibited when SPIN and figure-ground tasks are more difficult (i.e., at target-to-masker ratios corresponding to 60% rather than 90% performance)—consistent with increased cortical gain at lower levels of the auditory hierarchy. Overall, our results reveal a common neural substrate for these basic (figure-ground) and naturally relevant (SPIN) tasks—which provides a common computational basis for the link between SPIN perception and fundamental auditory grouping.
Collapse
|
20
|
Abstract
The ability to sustain attention on a task-relevant sound source while avoiding
distraction from concurrent sounds is fundamental to listening in crowded
environments. We aimed to (a) devise an experimental paradigm with which this
aspect of listening can be isolated and (b) evaluate the applicability of
pupillometry as an objective measure of sustained attention in young and older
populations. We designed a paradigm that continuously measured behavioral
responses and pupillometry during 25-s trials. Stimuli contained a number of
concurrent, spectrally distinct tone streams. On each trial, participants
detected gaps in one of the streams while resisting distraction from the others.
Behavior demonstrated increasing difficulty with time-on-task and with
number/proximity of distractor streams. In young listeners
(N = 20; aged 18 to 35 years), pupil diameter (on the group and
individual level) was dynamically modulated by instantaneous task difficulty:
Periods where behavioral performance revealed a strain on sustained attention
were accompanied by increased pupil diameter. Only trials on which participants
performed successfully were included in the pupillometry analysis so that the
observed effects reflect task demands as opposed to failure to attend. In line
with existing reports, we observed global changes to pupil dynamics in the older
group (N = 19; aged 63 to 79 years) including decreased pupil
diameter, limited dilation range, and reduced temporal variability. However,
despite these changes, older listeners showed similar effects of attentive
tracking to those observed in the young listeners. Overall, our results
demonstrate that pupillometry can be a reliable and time-sensitive measure of
attentive tracking over long durations in both young and (with caveats) older
listeners.
Collapse
|
21
|
Can You Hear Out the Melody? Testing Musical Scene Perception in Young Normal-Hearing and Older Hearing-Impaired Listeners. Trends Hear 2020; 24:2331216520945826. [PMID: 32895034 PMCID: PMC7502688 DOI: 10.1177/2331216520945826] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
It is well known that hearing loss compromises auditory scene analysis abilities,
as is usually manifested in difficulties of understanding speech in noise.
Remarkably little is known about auditory scene analysis of hearing-impaired
(HI) listeners when it comes to musical sounds. Specifically, it is unclear to
which extent HI listeners are able to hear out a melody or an instrument from a
musical mixture. Here, we tested a group of younger normal-hearing (yNH) and
older HI (oHI) listeners with moderate hearing loss in their ability to match
short melodies and instruments presented as part of mixtures. Four-tone
sequences were used in conjunction with a simple musical accompaniment that
acted as a masker (cello/piano dyads or spectrally matched noise). In each
trial, a signal-masker mixture was presented, followed by two different versions
of the signal alone. Listeners indicated which signal version was part of the
mixture. Signal versions differed either in terms of the sequential order of the
pitch sequence or in terms of timbre (flute vs. trumpet). Signal-to-masker
thresholds were measured by varying the signal presentation level in an adaptive
two-down/one-up procedure. We observed that thresholds of oHI listeners were
elevated by on average 10 dB compared with that of yNH listeners. In contrast to
yNH listeners, oHI listeners did not show evidence of listening in dips of the
masker. Musical training of participants was associated with a lowering of
thresholds. These results may indicate detrimental effects of hearing loss on
central aspects of musical scene perception.
Collapse
|
22
|
Impairments of auditory scene analysis in posterior cortical atrophy. Brain 2020; 143:2689-2695. [PMID: 32875326 PMCID: PMC7523698 DOI: 10.1093/brain/awaa221] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 05/27/2020] [Accepted: 05/27/2020] [Indexed: 01/08/2023] Open
Abstract
Although posterior cortical atrophy is often regarded as the canonical 'visual dementia', auditory symptoms may also be salient in this disorder. Patients often report particular difficulty hearing in busy environments; however, the core cognitive process-parsing of the auditory environment ('auditory scene analysis')-has been poorly characterized. In this cross-sectional study, we used customized perceptual tasks to assess two generic cognitive operations underpinning auditory scene analysis-sound source segregation and sound event grouping-in a cohort of 21 patients with posterior cortical atrophy, referenced to 15 healthy age-matched individuals and 21 patients with typical Alzheimer's disease. After adjusting for peripheral hearing function and performance on control tasks assessing perceptual and executive response demands, patients with posterior cortical atrophy performed significantly worse on both auditory scene analysis tasks relative to healthy controls and patients with typical Alzheimer's disease (all P < 0.05). Our findings provide further evidence of central auditory dysfunction in posterior cortical atrophy, with implications for our pathophysiological understanding of Alzheimer syndromes as well as clinical diagnosis and management.
Collapse
|
23
|
Prior Knowledge Guides Speech Segregation in Human Auditory Cortex. Cereb Cortex 2020; 29:1561-1571. [PMID: 29788144 DOI: 10.1093/cercor/bhy052] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/22/2018] [Accepted: 02/15/2018] [Indexed: 11/12/2022] Open
Abstract
Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
Collapse
|
24
|
Abstract
Memory, on multiple timescales, is critical to our ability to discover the structure of our surroundings, and efficiently interact with the environment. We combined behavioural manipulation and modelling to investigate the dynamics of memory formation for rarely reoccurring acoustic patterns. In a series of experiments, participants detected the emergence of regularly repeating patterns within rapid tone-pip sequences. Unbeknownst to them, a few patterns reoccurred every ~3 min. All sequences consisted of the same 20 frequencies and were distinguishable only by the order of tone-pips. Despite this, reoccurring patterns were associated with a rapidly growing detection-time advantage over novel patterns. This effect was implicit, robust to interference, and persisted for 7 weeks. The results implicate an interplay between short (a few seconds) and long-term (over many minutes) integration in memory formation and demonstrate the remarkable sensitivity of the human auditory system to sporadically reoccurring structure within the acoustic environment.
Collapse
|
25
|
Cognitive resources are distributed among the entire auditory landscape in auditory scene analysis. Psychophysiology 2019; 57:e13487. [PMID: 31578762 DOI: 10.1111/psyp.13487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 08/21/2019] [Accepted: 09/04/2019] [Indexed: 01/30/2023]
Abstract
Although attention has been shown to enhance neural representations of selected inputs, the fate of unselected background sounds is still debated. The goal of the current study was to understand how processing resources are distributed among attended and unattended sounds during auditory scene analysis. We used a three-stream paradigm with four acoustic features uniquely defining each sound stream (frequency, envelope shape, spatial location, tone quality). We manipulated task load by having participants perform a difficult auditory task and an easy movie-viewing task with the same set of sounds in separate conditions. The mismatch negativity (MMN) component of event-related brain potentials (ERPs) was measured to evaluate sound processing in both conditions. We found no effect of task demands on unattended sound processing: MMNs were elicited by unattended deviants during both low- and high-load task conditions. A key factor of this result was the use of unique tone feature combinations to distinguish each of the three sound streams, strengthening the segregation of streams. In the auditory task, the P3b component demonstrates a two-stage process of target evaluation. Thus, these results, in conjunction with results of previous studies, suggest that stimulus-driven factors that strengthen stream segregation can free up processing capacity for higher-level analyses. The results illustrate the interactive nature of top-down and stimulus-driven processes in stream formation, supporting a distributive theory of attention that balances the strength of the bottom-up input with perceptual goals in analyzing the auditory scene.
Collapse
|
26
|
Robust neural tracking of linguistic units relates to distractor suppression. Eur J Neurosci 2019; 51:641-650. [PMID: 31430411 DOI: 10.1111/ejn.14552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 08/01/2019] [Accepted: 08/08/2019] [Indexed: 11/30/2022]
Abstract
In a complex auditory scene, speech comprehension involves several stages: for example segregating the target from the background, recognizing syllables and integrating syllables into linguistic units (e.g., words). Although speech segregation is robust as shown by invariant neural tracking to target speech envelope, whether neural tracking to linguistic units is also robust and how this robustness is achieved remain unknown. To investigate these questions, we concurrently recorded neural responses tracking a rhythmic speech stream at its syllabic and word rates, using electroencephalography. Human participants listened to that target speech under a speech or noise distractor at varying signal-to-noise ratios. Neural tracking at the word rate was not as robust as neural tracking at the syllabic rate. Robust neural tracking to target's words was only observed under the speech distractor but not under the noise distractor. Moreover, this robust word tracking correlated with a successful suppression of distractor tracking. Critically, both word tracking and distractor suppression correlated with behavioural comprehension accuracy. In sum, our results suggest that a robust neural tracking of higher-level linguistic units relates to not only the target tracking, but also the distractor suppression.
Collapse
|
27
|
Rapid Ocular Responses Are Modulated by Bottom-up-Driven Auditory Salience. J Neurosci 2019; 39:7703-7714. [PMID: 31391262 PMCID: PMC6764203 DOI: 10.1523/jneurosci.0776-19.2019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/28/2019] [Accepted: 07/12/2019] [Indexed: 02/03/2023] Open
Abstract
Despite the prevalent use of alerting sounds in alarms and human-machine interface systems and the long-hypothesized role of the auditory system as the brain's "early warning system," we have only a rudimentary understanding of what determines auditory salience-the automatic attraction of attention by sound-and which brain mechanisms underlie this process. A major roadblock has been the lack of a robust, objective means of quantifying sound-driven attentional capture. Here we demonstrate that: (1) a reliable salience scale can be obtained from crowd-sourcing (N = 911), (2) acoustic roughness appears to be a driving feature behind this scaling, consistent with previous reports implicating roughness in the perceptual distinctiveness of sounds, and (3) crowd-sourced auditory salience correlates with objective autonomic measures. Specifically, we show that a salience ranking obtained from online raters correlated robustly with the superior colliculus-mediated ocular freezing response, microsaccadic inhibition (MSI), measured in naive, passively listening human participants (of either sex). More salient sounds evoked earlier and larger MSI, consistent with a faster orienting response. These results are consistent with the hypothesis that MSI reflects a general reorienting response that is evoked by potentially behaviorally important events regardless of their modality.SIGNIFICANCE STATEMENT Microsaccades are small, rapid, fixational eye movements that are measurable with sensitive eye-tracking equipment. We reveal a novel, robust link between microsaccade dynamics and the subjective salience of brief sounds (salience rankings obtained from a large number of participants in an online experiment): Within 300 ms of sound onset, the eyes of naive, passively listening participants demonstrate different microsaccade patterns as a function of the sound's crowd-sourced salience. These results position the superior colliculus (hypothesized to underlie microsaccade generation) as an important brain area to investigate in the context of a putative multimodal salience hub. They also demonstrate an objective means for quantifying auditory salience.
Collapse
|
28
|
Auditory Stream Segregation Can Be Modeled by Neural Competition in Cochlear Implant Listeners. Front Comput Neurosci 2019; 13:42. [PMID: 31333438 PMCID: PMC6616076 DOI: 10.3389/fncom.2019.00042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/17/2019] [Indexed: 11/13/2022] Open
Abstract
Auditory stream segregation is a perceptual process by which the human auditory system groups sounds from different sources into perceptually meaningful elements (e.g., a voice or a melody). The perceptual segregation of sounds is important, for example, for the understanding of speech in noisy scenarios, a particularly challenging task for listeners with a cochlear implant (CI). It has been suggested that some aspects of stream segregation may be explained by relatively basic neural mechanisms at a cortical level. During the past decades, a variety of models have been proposed to account for the data from stream segregation experiments in normal-hearing (NH) listeners. However, little attention has been given to corresponding findings in CI listeners. The present study investigated whether a neural model of sequential stream segregation, proposed to describe the behavioral effects observed in NH listeners, can account for behavioral data from CI listeners. The model operates on the stimulus features at the cortical level and includes a competition stage between the neuronal units encoding the different percepts. The competition arises from a combination of mutual inhibition, adaptation, and additive noise. The model was found to capture the main trends in the behavioral data from CI listeners, such as the larger probability of a segregated percept with increasing the feature difference between the sounds as well as the build-up effect. Importantly, this was achieved without any modification to the model's competition stage, suggesting that stream segregation could be mediated by a similar mechanism in both groups of listeners.
Collapse
|
29
|
Auditory Figure-Ground Segregation Is Impaired by High Visual Load. J Neurosci 2018; 39:1699-1708. [PMID: 30541915 PMCID: PMC6391559 DOI: 10.1523/jneurosci.2518-18.2018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/19/2018] [Accepted: 11/19/2018] [Indexed: 11/21/2022] Open
Abstract
Figure-ground segregation is fundamental to listening in complex acoustic environments. An ongoing debate pertains to whether segregation requires attention or is "automatic" and preattentive. In this magnetoencephalography study, we tested a prediction derived from load theory of attention (e.g., Lavie, 1995) that segregation requires attention but can benefit from the automatic allocation of any "leftover" capacity under low load. Complex auditory scenes were modeled with stochastic figure-ground stimuli (Teki et al., 2013), which occasionally contained repeated frequency component "figures." Naive human participants (both sexes) passively listened to these signals while performing a visual attention task of either low or high load. While clear figure-related neural responses were observed under conditions of low load, high visual load substantially reduced the neural response to the figure in auditory cortex (planum temporale, Heschl's gyrus). We conclude that fundamental figure-ground segregation in hearing is not automatic but draws on resources that are shared across vision and audition.SIGNIFICANCE STATEMENT This work resolves a long-standing question of whether figure-ground segregation, a fundamental process of auditory scene analysis, requires attention or is underpinned by automatic, encapsulated computations. Task-irrelevant sounds were presented during performance of a visual search task. We revealed a clear magnetoencephalography neural signature of figure-ground segregation in conditions of low visual load, which was substantially reduced in conditions of high visual load. This demonstrates that, although attention does not need to be actively allocated to sound for auditory segregation to occur, segregation depends on shared computational resources across vision and hearing. The findings further highlight that visual load can impair the computational capacity of the auditory system, even when it does not simply dampen auditory responses as a whole.
Collapse
|
30
|
Context-dependent role of selective attention for change detection in multi-speaker scenes. Hum Brain Mapp 2018; 39:4623-4632. [PMID: 29999565 PMCID: PMC6866511 DOI: 10.1002/hbm.24310] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 06/25/2018] [Accepted: 06/26/2018] [Indexed: 11/12/2022] Open
Abstract
Disappearance of a voice or other sound source may often go unnoticed when the auditory scene is crowded. We explored the role of selective attention for this change deafness with magnetoencephalography in multi-speaker scenes. Each scene was presented two times in direct succession, and one target speaker was frequently omitted in Scene 2. When listeners were previously cued to the target speaker, activity in auditory cortex time locked to the target speaker's sound envelope was selectively enhanced in Scene 1, as was determined by a cross-correlation analysis. Moreover, the response was stronger for hit trials than for miss trials, confirming that selective attention played a role for subsequent change detection. If selective attention to the streams where the change occurred was generally required for successful change detection, neural enhancement of this stream would also be expected without cue in hit compared to miss trials. However, when listeners were not previously cued to the target, no enhanced activity for the target speaker was observed for hit trials, and there was no significant difference between hit and miss trials. These results, first, confirm a role for attention in change detection for situations where the target source is known. Second, they suggest that the omission of a speaker, or more generally an auditory stream, can alternatively be detected without selective attentional enhancement of the target stream. Several models and strategies could be envisaged for change detection in this case, including global comparison of the subsequent scenes.
Collapse
|
31
|
Electrophysiological Correlates of Speaker Segregation and Foreground-Background Selection in Ambiguous Listening Situations. Neuroscience 2018; 389:19-29. [PMID: 28735101 DOI: 10.1016/j.neuroscience.2017.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 07/10/2017] [Accepted: 07/10/2017] [Indexed: 11/15/2022]
Abstract
In everyday listening environments, a main task for our auditory system is to follow one out of multiple speakers talking simultaneously. The present study was designed to find electrophysiological indicators of two central processes involved - segregating the speech mixture into distinct speech sequences corresponding to the two speakers, and then attending to one of the speech sequences. We generated multistable speech stimuli that were set up to create ambiguity as to whether only one or two speakers are talking. Thereby we were able to investigate three perceptual alternatives (no segregation, segregated - speaker A in the foreground, segregated - speaker B in the foreground) without any confounding stimulus changes. Participants listened to a continuously repeating sequence of syllables, which were uttered alternately by two human speakers, and indicated whether they perceived the sequence as an inseparable mixture or as originating from two separate speakers. In the latter case, they distinguished which speaker was in their attentional foreground. Our data show a long-lasting event-related potential (ERP) modulation starting at 130ms after stimulus onset, which can be explained by the perceptual organization of the two speech sequences into attended foreground and ignored background streams. Our paradigm extends previous work with pure-tone sequences toward speech stimuli and adds the possibility to obtain neural correlates of the difficulty to segregate a speech mixture into distinct streams.
Collapse
|
32
|
Auditory Stream Segregation and Selective Attention for Cochlear Implant Listeners: Evidence From Behavioral Measures and Event-Related Potentials. Front Neurosci 2018; 12:581. [PMID: 30186105 PMCID: PMC6110823 DOI: 10.3389/fnins.2018.00581] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 08/02/2018] [Indexed: 11/13/2022] Open
Abstract
The role of the spatial separation between the stimulating electrodes (electrode separation) in sequential stream segregation was explored in cochlear implant (CI) listeners using a deviant detection task. Twelve CI listeners were instructed to attend to a series of target sounds in the presence of interleaved distractor sounds. A deviant was randomly introduced in the target stream either at the beginning, middle or end of each trial. The listeners were asked to detect sequences that contained a deviant and to report its location within the trial. The perceptual segregation of the streams should, therefore, improve deviant detection performance. The electrode range for the distractor sounds was varied, resulting in different amounts of overlap between the target and the distractor streams. For the largest electrode separation condition, event-related potentials (ERPs) were recorded under active and passive listening conditions. The listeners were asked to perform the behavioral task for the active listening condition and encouraged to watch a muted movie for the passive listening condition. Deviant detection performance improved with increasing electrode separation between the streams, suggesting that larger electrode differences facilitate the segregation of the streams. Deviant detection performance was best for deviants happening late in the sequence, indicating that a segregated percept builds up over time. The analysis of the ERP waveforms revealed that auditory selective attention modulates the ERP responses in CI listeners. Specifically, the responses to the target stream were, overall, larger in the active relative to the passive listening condition. Conversely, the ERP responses to the distractor stream were not affected by selective attention. However, no significant correlation was observed between the behavioral performance and the amount of attentional modulation. Overall, the findings from the present study suggest that CI listeners can use electrode separation to perceptually group sequential sounds. Moreover, selective attention can be deployed on the resulting auditory objects, as reflected by the attentional modulation of the ERPs at the group level.
Collapse
|
33
|
Abstract
Why do humpback whales sing? This paper considers the hypothesis that humpback whales may use song for long range sonar. Given the vocal and social behavior of humpback whales, in several cases it is not apparent how they monitor the movements of distant whales or prey concentrations. Unless distant animals produce sounds, humpback whales are unlikely to be aware of their presence or actions. Some field observations are strongly suggestive of the use of song as sonar. Humpback whales sometimes stop singing and then rapidly approach distant whales in cases where sound production by those whales is not apparent, and singers sometimes alternately sing and swim while attempting to intercept another whale that is swimming evasively. In the evolutionary development of modern cetaceans, perceptual mechanisms have shifted from reliance on visual scanning to the active generation and monitoring of echoes. It is hypothesized that as the size and distance of relevant events increased, humpback whales developed adaptive specializations for long-distance echolocation. Differences between use of songs by humpback whales and use of sonar by other echolocating species are discussed, as are similarities between bat echolocation and singing by humpback whales. Singing humpback whales are known to emit sounds intense enough to generate echoes at long ranges, and to flexibly control the timing and qualities of produced sounds. The major problem for the hypothesis is the lack of recordings of echoes from other whales arriving at singers immediately before they initiate actions related to those whales. An earlier model of echoic processing by singing humpback whales is here revised to incorporate recent discoveries. According to the revised model, both direct echoes from targets and modulations in song-generated reverberation can provide singers with information that can help them make decisions about future actions related to mating, traveling, and foraging. The model identifies acoustic and structural features produced by singing humpback whales that may facilitate a singer's ability to interpret changes in echoic scenes and suggests that interactive signal coordination by singing whales may help them to avoid mutual interference. Specific, testable predictions of the model are presented.
Collapse
|
34
|
Sparse periodicity-based auditory features explain human performance in a spatial multitalker auditory scene analysis task. Eur J Neurosci 2018; 51:1353-1363. [PMID: 29855099 DOI: 10.1111/ejn.13981] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 05/15/2018] [Accepted: 05/21/2018] [Indexed: 11/28/2022]
Abstract
Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time-frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis task. A recent psychoacoustic study investigated the relevance of low signal-to-noise ratio (SNR) components of a target signal on speech intelligibility in a spatial multitalker situation. For this, a three-talker stimulus was manipulated in the spectro-temporal domain such that target speech time-frequency units below a variable SNR threshold (SNRcrit ) were discarded while keeping the interferers unchanged. The psychoacoustic data indicate that only target components at and above a local SNR of about 0 dB contribute to intelligibility. This study applies an auditory scene analysis "glimpsing" model to the same manipulated stimuli. Model data are found to be similar to the human data, supporting the notion of "glimpsing," that is, that salient speech-related information is predominantly used by the auditory system to decode speech embedded in a mixture of sounds, at least for the tested conditions of three overlapping speech signals. This implies that perceptually relevant auditory information is sparse and may be processed with low computational effort, which is relevant for neurophysiological research of scene analysis and novelty processing in the auditory system.
Collapse
|
35
|
Activity in Human Auditory Cortex Represents Spatial Separation Between Concurrent Sounds. J Neurosci 2018; 38:4977-4984. [PMID: 29712782 DOI: 10.1523/jneurosci.3323-17.2018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 03/05/2018] [Accepted: 03/09/2018] [Indexed: 11/21/2022] Open
Abstract
The primary and posterior auditory cortex (AC) are known for their sensitivity to spatial information, but how this information is processed is not yet understood. AC that is sensitive to spatial manipulations is also modulated by the number of auditory streams present in a scene (Smith et al., 2010), suggesting that spatial and nonspatial cues are integrated for stream segregation. We reasoned that, if this is the case, then it is the distance between sounds rather than their absolute positions that is essential. To test this hypothesis, we measured human brain activity in response to spatially separated concurrent sounds with fMRI at 7 tesla in five men and five women. Stimuli were spatialized amplitude-modulated broadband noises recorded for each participant via in-ear microphones before scanning. Using a linear support vector machine classifier, we investigated whether sound location and/or location plus spatial separation between sounds could be decoded from the activity in Heschl's gyrus and the planum temporale. The classifier was successful only when comparing patterns associated with the conditions that had the largest difference in perceptual spatial separation. Our pattern of results suggests that the representation of spatial separation is not merely the combination of single locations, but rather is an independent feature of the auditory scene.SIGNIFICANCE STATEMENT Often, when we think of auditory spatial information, we think of where sounds are coming from-that is, the process of localization. However, this information can also be used in scene analysis, the process of grouping and segregating features of a soundwave into objects. Essentially, when sounds are further apart, they are more likely to be segregated into separate streams. Here, we provide evidence that activity in the human auditory cortex represents the spatial separation between sounds rather than their absolute locations, indicating that scene analysis and localization processes may be independent.
Collapse
|
36
|
Adaptive and Selective Time Averaging of Auditory Scenes. Curr Biol 2018; 28:1405-1418.e10. [PMID: 29681472 DOI: 10.1016/j.cub.2018.03.049] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 01/24/2018] [Accepted: 03/21/2018] [Indexed: 11/23/2022]
Abstract
To overcome variability, estimate scene characteristics, and compress sensory input, perceptual systems pool data into statistical summaries. Despite growing evidence for statistical representations in perception, the underlying mechanisms remain poorly understood. One example of such representations occurs in auditory scenes, where background texture appears to be represented with time-averaged sound statistics. We probed the averaging mechanism using "texture steps"-textures containing subtle shifts in stimulus statistics. Although generally imperceptible, steps occurring in the previous several seconds biased texture judgments, indicative of a multi-second averaging window. Listeners seemed unable to willfully extend or restrict this window but showed signatures of longer integration times for temporally variable textures. In all cases the measured timescales were substantially longer than previously reported integration times in the auditory system. Integration also showed signs of being restricted to sound elements attributed to a common source. The results suggest an integration process that depends on stimulus characteristics, integrating over longer extents when it benefits statistical estimation of variable signals and selectively integrating stimulus components likely to have a common cause in the world. Our methodology could be naturally extended to examine statistical representations of other types of sensory signals.
Collapse
|
37
|
Processing of interaural phase differences in components of harmonic and mistuned complexes in the inferior colliculus of the Mongolian gerbil. Eur J Neurosci 2018; 47:1242-1251. [PMID: 29603825 DOI: 10.1111/ejn.13922] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 02/19/2018] [Accepted: 03/22/2018] [Indexed: 11/30/2022]
Abstract
Harmonicity and spatial location provide eminent cues for the perceptual grouping of sounds. In general, harmonicity is a strong grouping cue. In contrast, spatial cues such as interaural phase or time difference provide for strong grouping of stimulus sequences but weak grouping for simultaneously presented sounds. By studying the neuronal basis underlying the interaction of these cues in processing simultaneous sounds using van Rossum spike train distance measures, we aim at explaining the interaction observed in psychophysical experiments. Responses to interaural phase differences imposed on single components of harmonic and mistuned complex tones as well as noise delay functions were recorded as multiunit responses from the inferior colliculus of Mongolian gerbils. Results revealed a better representation of interaural phase differences if imposed on a harmonic rather than a mistuned frequency component of a complex tone. The representation of interaural phase differences was better for long integration-time windows approximately reflecting firing rates rather than short integration-time windows reflecting the temporal pattern of the stimulus-driven response. We found only a weak impact of interaural phase differences if combined with mistuning of a component in a harmonic tone complex.
Collapse
|
38
|
Abstract
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Collapse
|
39
|
Assessing Top-Down and Bottom-Up Contributions to Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2018; 12:121. [PMID: 29563861 PMCID: PMC5845899 DOI: 10.3389/fnins.2018.00121] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 02/15/2018] [Indexed: 11/24/2022] Open
Abstract
Polyphonic music listening well exemplifies processes typically involved in daily auditory scene analysis situations, relying on an interactive interplay between bottom-up and top-down processes. Most studies investigating scene analysis have used elementary auditory scenes, however real-world scene analysis is far more complex. In particular, music, contrary to most other natural auditory scenes, can be perceived by either integrating or, under attentive control, segregating sound streams, often carried by different instruments. One of the prominent bottom-up cues contributing to multi-instrument music perception is their timbre difference. In this work, we introduce and validate a novel paradigm designed to investigate, within naturalistic musical auditory scenes, attentive modulation as well as its interaction with bottom-up processes. Two psychophysical experiments are described, employing custom-composed two-voice polyphonic music pieces within a framework implementing a behavioral performance metric to validate listener instructions requiring either integration or segregation of scene elements. In Experiment 1, the listeners' locus of attention was switched between individual instruments or the aggregate (i.e., both instruments together), via a task requiring the detection of temporal modulations (i.e., triplets) incorporated within or across instruments. Subjects responded post-stimulus whether triplets were present in the to-be-attended instrument(s). Experiment 2 introduced the bottom-up manipulation by adding a three-level morphing of instrument timbre distance to the attentional framework. The task was designed to be used within neuroimaging paradigms; Experiment 2 was additionally validated behaviorally in the functional Magnetic Resonance Imaging (fMRI) environment. Experiment 1 subjects (N = 29, non-musicians) completed the task at high levels of accuracy, showing no group differences between any experimental conditions. Nineteen listeners also participated in Experiment 2, showing a main effect of instrument timbre distance, even though within attention-condition timbre-distance contrasts did not demonstrate any timbre effect. Correlation of overall scores with morph-distance effects, computed by subtracting the largest from the smallest timbre distance scores, showed an influence of general task difficulty on the timbre distance effect. Comparison of laboratory and fMRI data showed scanner noise had no adverse effect on task performance. These Experimental paradigms enable to study both bottom-up and top-down contributions to auditory stream segregation and integration within psychophysical and neuroimaging experiments.
Collapse
|
40
|
Predictive coding in auditory perception: challenges and unresolved questions. Eur J Neurosci 2018; 51:1151-1160. [PMID: 29250827 DOI: 10.1111/ejn.13802] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/03/2017] [Accepted: 11/20/2017] [Indexed: 11/30/2022]
Abstract
Predictive coding is arguably the currently dominant theoretical framework for the study of perception. It has been employed to explain important auditory perceptual phenomena, and it has inspired theoretical, experimental and computational modelling efforts aimed at describing how the auditory system parses the complex sound input into meaningful units (auditory scene analysis). These efforts have uncovered some vital questions, addressing which could help to further specify predictive coding and clarify some of its basic assumptions. The goal of the current review is to motivate these questions and show how unresolved issues in explaining some auditory phenomena lead to general questions of the theoretical framework. We focus on experimental and computational modelling issues related to sequential grouping in auditory scene analysis (auditory pattern detection and bistable perception), as we believe that this is the research topic where predictive coding has the highest potential for advancing our understanding. In addition to specific questions, our analysis led us to identify three more general questions that require further clarification: (1) What exactly is meant by prediction in predictive coding? (2) What governs which generative models make the predictions? and (3) What (if it exists) is the correlate of perceptual experience within the predictive coding framework?
Collapse
|
41
|
Attentional Resources Are Needed for Auditory Stream Segregation in Aging. Front Aging Neurosci 2017; 9:414. [PMID: 29311902 PMCID: PMC5743864 DOI: 10.3389/fnagi.2017.00414] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 12/11/2017] [Indexed: 11/21/2022] Open
Abstract
The ability to select sound streams from background noise becomes challenging with age, even with normal peripheral auditory functioning. Reduced stream segregation ability has been reported in older compared to younger adults. However, the reason why there is a difference is still unknown. The current study investigated the hypothesis that automatic sound processing is impaired with aging, which then contributes to difficulty actively selecting subsets of sounds in noisy environments. We presented a simple intensity oddball sequence in various conditions with irrelevant background sounds while recording EEG. The ability to detect the oddball tones was dependent on the ability to automatically or actively segregate the sounds to frequency streams. Listeners were able to actively segregate sounds to perform the loudness detection task, but there was no indication of automatic segregation of background sounds while watching a movie. Thus, our results indicate impaired automatic processes in aging that may explain more effortful listening, and that tax attentional systems when selecting sound streams in noisy environments.
Collapse
|
42
|
Temporal Processing in Audition: Insights from Music. Neuroscience 2017; 389:4-18. [PMID: 29108832 PMCID: PMC6371985 DOI: 10.1016/j.neuroscience.2017.10.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 10/24/2017] [Accepted: 10/27/2017] [Indexed: 11/28/2022]
Abstract
What music psychology reveals about the natural bounds of human temporal processing. Psychoacoustics of beat perception. Neurophysiology of beat perception. Predictable timing in auditory perception. Neural mechanisms of timing.
Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these disparate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together findings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions of their respective fields and consider the cross-disciplinary implications of studying auditory temporal sequence processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore the perceptual effects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception. Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception of rhythmic patterns. We conclude with a brief summary and outlook for future research.
Collapse
|
43
|
Abstract
Auditory perception is our main gateway to communication with others via speech and music, and it also plays an important role in alerting and orienting us to new events. This review provides an overview of selected topics pertaining to the perception and neural coding of sound, starting with the first stage of filtering in the cochlea and its profound impact on perception. The next topic, pitch, has been debated for millennia, but recent technical and theoretical developments continue to provide us with new insights. Cochlear filtering and pitch both play key roles in our ability to parse the auditory scene, enabling us to attend to one auditory object or stream while ignoring others. An improved understanding of the basic mechanisms of auditory perception will aid us in the quest to tackle the increasingly important problem of hearing loss in our aging population.
Collapse
|
44
|
Is there evidence for the added value and correct use of manual and automatically switching multimemory hearing devices? A scoping review. Int J Audiol 2017; 57:176-183. [PMID: 29017358 DOI: 10.1080/14992027.2017.1385864] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
OBJECTIVES To review literature on the use of manual and automatically switching multimemory devices by hearing aid and CI recipients, and to investigate if recipients appreciate and adequately use the ability to switch between programmes in various listening environments. DESIGN Literature was searched using PubMed, Embase and ISI/Web of Science. Additional studies were identified by screening reference and citation lists, and by contacting experts. STUDY SAMPLE The search yielded 1109 records that were screened on title and abstract. This resulted in the full-text assessment of 37 articles. RESULTS Sixteen articles reported on the use of multiple programmes for various listening environments, three articles reported on the use of an automatic switching mode. All studies reported on hearing aid recipients only, no study with CI recipients fulfilled the selection criteria. CONCLUSIONS Despite the high number of manual and automatically switching multimemory devices sold each year, there are remarkably few studies about the use of multiple programmes or automatic switching modes for various listening environments. No studies were found that examined the accuracy of the use of programmes for specific listening environments. An automatic switching device might be a solution if recipients are not able, or willing, to switch manually between programmes.
Collapse
|
45
|
The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention. eLife 2017; 6. [PMID: 28992445 PMCID: PMC5634786 DOI: 10.7554/elife.27203] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 09/14/2017] [Indexed: 11/26/2022] Open
Abstract
Humans excel at selectively listening to a target speaker in background noise such as competing voices. While the encoding of speech in the auditory cortex is modulated by selective attention, it remains debated whether such modulation occurs already in subcortical auditory structures. Investigating the contribution of the human brainstem to attention has, in particular, been hindered by the tiny amplitude of the brainstem response. Its measurement normally requires a large number of repetitions of the same short sound stimuli, which may lead to a loss of attention and to neural adaptation. Here we develop a mathematical method to measure the auditory brainstem response to running speech, an acoustic stimulus that does not repeat and that has a high ecological validity. We employ this method to assess the brainstem's activity when a subject listens to one of two competing speakers, and show that the brainstem response is consistently modulated by attention.
Collapse
|
46
|
Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
|
47
|
Echo-acoustic flow shapes object representation in spatially complex acoustic scenes. J Neurophysiol 2017; 117:2113-2124. [PMID: 28275060 DOI: 10.1152/jn.00860.2016] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Revised: 03/06/2017] [Accepted: 03/06/2017] [Indexed: 11/22/2022] Open
Abstract
Echolocating bats use echoes of their sonar emissions to determine the position and distance of objects or prey. Target distance is represented as a map of echo delay in the auditory cortex (AC) of bats. During a bat's flight through a natural complex environment, echo streams are reflected from multiple objects along its flight path. Separating such complex streams of echoes or other sounds is a challenge for the auditory system of bats as well as other animals. We investigated the representation of multiple echo streams in the AC of anesthetized bats (Phyllostomus discolor) and tested the hypothesis that neurons can lock on echoes from specific objects in a complex echo-acoustic pattern while the representation of surrounding objects is suppressed. We combined naturalistic pulse/echo sequences simulating a bat's flight through a virtual acoustic space with extracellular recordings. Neurons could selectively lock on echoes from one object in complex echo streams originating from two different objects along a virtual flight path. The objects were processed sequentially in the order in which they were approached. Object selection depended on sequential changes of echo delay and amplitude, but not on absolute values. Furthermore, the detailed representation of the object echo delays in the cortical target range map was not fixed but could be dynamically adapted depending on the temporal pattern of sonar emission during target approach within a simulated flight sequence.NEW & NOTEWORTHY Complex signal analysis is a challenging task in sensory processing for all animals, particularly for bats because they use echolocation for navigation in darkness. Recent studies proposed that the bat's perceptional system might organize complex echo-acoustic information into auditory streams, allowing it to track specific auditory objects during flight. We show that in the auditory cortex of bats, neurons can selectively respond to echo streams from specific objects.
Collapse
|
48
|
Hearing Scenes: A Neuromagnetic Signature of Auditory Source and Reverberant Space Separation. eNeuro 2017; 4:eN-NWR-0007-17. [PMID: 28451630 PMCID: PMC5394928 DOI: 10.1523/eneuro.0007-17.2017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Revised: 02/03/2017] [Accepted: 02/06/2017] [Indexed: 11/21/2022] Open
Abstract
Perceiving the geometry of surrounding space is a multisensory process, crucial to contextualizing object perception and guiding navigation behavior. Humans can make judgments about surrounding spaces from reverberation cues, caused by sounds reflecting off multiple interior surfaces. However, it remains unclear how the brain represents reverberant spaces separately from sound sources. Here, we report separable neural signatures of auditory space and source perception during magnetoencephalography (MEG) recording as subjects listened to brief sounds convolved with monaural room impulse responses (RIRs). The decoding signature of sound sources began at 57 ms after stimulus onset and peaked at 130 ms, while space decoding started at 138 ms and peaked at 386 ms. Importantly, these neuromagnetic responses were readily dissociable in form and time: while sound source decoding exhibited an early and transient response, the neural signature of space was sustained and independent of the original source that produced it. The reverberant space response was robust to variations in sound source, and vice versa, indicating a generalized response not tied to specific source-space combinations. These results provide the first neuromagnetic evidence for robust, dissociable auditory source and reverberant space representations in the human brain and reveal the temporal dynamics of how auditory scene analysis extracts percepts from complex naturalistic auditory signals.
Collapse
|
49
|
Single Neurons in the Avian Auditory Cortex Encode Individual Identity and Propagation Distance in Naturally Degraded Communication Calls. J Neurosci 2017; 37:3491-3510. [PMID: 28235893 PMCID: PMC5373131 DOI: 10.1523/jneurosci.2220-16.2017] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Revised: 01/08/2017] [Accepted: 01/13/2017] [Indexed: 11/21/2022] Open
Abstract
One of the most complex tasks performed by sensory systems is "scene analysis": the interpretation of complex signals as behaviorally relevant objects. The study of this problem, universal to species and sensory modalities, is particularly challenging in audition, where sounds from various sources and localizations, degraded by propagation through the environment, sum to form a single acoustical signal. Here we investigated in a songbird model, the zebra finch, the neural substrate for ranging and identifying a single source. We relied on ecologically and behaviorally relevant stimuli, contact calls, to investigate the neural discrimination of individual vocal signature as well as sound source distance when calls have been degraded through propagation in a natural environment. Performing electrophysiological recordings in anesthetized birds, we found neurons in the auditory forebrain that discriminate individual vocal signatures despite long-range degradation, as well as neurons discriminating propagation distance, with varying degrees of multiplexing between both information types. Moreover, the neural discrimination performance of individual identity was not affected by propagation-induced degradation beyond what was induced by the decreased intensity. For the first time, neurons with distance-invariant identity discrimination properties as well as distance-discriminant neurons are revealed in the avian auditory cortex. Because these neurons were recorded in animals that had prior experience neither with the vocalizers of the stimuli nor with long-range propagation of calls, we suggest that this neural population is part of a general-purpose system for vocalizer discrimination and ranging.SIGNIFICANCE STATEMENT Understanding how the brain makes sense of the multitude of stimuli that it continually receives in natural conditions is a challenge for scientists. Here we provide a new understanding of how the auditory system extracts behaviorally relevant information, the vocalizer identity and its distance to the listener, from acoustic signals that have been degraded by long-range propagation in natural conditions. We show, for the first time, that single neurons, in the auditory cortex of zebra finches, are capable of discriminating the individual identity and sound source distance in conspecific communication calls. The discrimination of identity in propagated calls relies on a neural coding that is robust to intensity changes, signals' quality, and decreases in the signal-to-noise ratio.
Collapse
|
50
|
Frogs Exploit Statistical Regularities in Noisy Acoustic Scenes to Solve Cocktail-Party-like Problems. Curr Biol 2017; 27:743-750. [PMID: 28238657 DOI: 10.1016/j.cub.2017.01.031] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 01/13/2017] [Accepted: 01/18/2017] [Indexed: 11/30/2022]
Abstract
Noise is a ubiquitous source of errors in all forms of communication [1]. Noise-induced errors in speech communication, for example, make it difficult for humans to converse in noisy social settings, a challenge aptly named the "cocktail party problem" [2]. Many nonhuman animals also communicate acoustically in noisy social groups and thus face biologically analogous problems [3]. However, we know little about how the perceptual systems of receivers are evolutionarily adapted to avoid the costs of noise-induced errors in communication. In this study of Cope's gray treefrog (Hyla chrysoscelis; Hylidae), we investigated whether receivers exploit a potential statistical regularity present in noisy acoustic scenes to reduce errors in signal recognition and discrimination. We developed an anatomical/physiological model of the peripheral auditory system to show that temporal correlation in amplitude fluctuations across the frequency spectrum ("comodulation") [4-6] is a feature of the noise generated by large breeding choruses of sexually advertising males. In four psychophysical experiments, we investigated whether females exploit comodulation in background noise to mitigate noise-induced errors in evolutionarily critical mate-choice decisions. Subjects experienced fewer errors in recognizing conspecific calls and in selecting the calls of high-quality mates in the presence of simulated chorus noise that was comodulated. These data show unequivocally, and for the first time, that exploiting statistical regularities present in noisy acoustic scenes is an important biological strategy for solving cocktail-party-like problems in nonhuman animal communication.
Collapse
|