1
|
Joshi N, Ng WY, Thakkar K, Duque D, Yin P, Fritz J, Elhilali M, Shamma S. Temporal coherence shapes cortical responses to speech mixtures in a ferret cocktail party. Commun Biol 2024; 7:1392. [PMID: 39455846 PMCID: PMC11511904 DOI: 10.1038/s42003-024-07096-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
Perceptual segregation of complex sounds such as speech and music simultaneously emanating from multiple sources is a remarkable ability that is common in humans and other animals alike. Unlike animal physiological experiments with simplified sounds or human investigations with spatially broad imaging techniques, this study combines insights from animal single-unit recordings with segregation of speech-like sound mixtures. Ferrets are trained to attend to a female voice and detect a target word, both in presence and absence of a concurrent equally salient male voice. Recordings are made in primary and secondary auditory cortical fields, and in frontal cortex. During task performance, representation of the female words becomes enhanced relative to the male in all, but especially in higher cortical regions. Analysis of the temporal and spectral response characteristics during task performance reveals how speech segregation gradually emerges in the auditory cortex. A computational model evaluated on the same voice mixtures replicates and extends these results to different attentional targets (attention to female or male voices). These findings underscore the role of the principle of temporal coherence whereby attention to a target voice binds together all neural responses coherently modulated with the target, thus ultimately forming and extracting a common auditory stream.
Collapse
Affiliation(s)
- Neha Joshi
- Electrical and Computer Engineering Department, University of Maryland, College Park, MD, USA
| | - Wing Yiu Ng
- Electrical and Computer Engineering Department, University of Maryland, College Park, MD, USA
| | - Karan Thakkar
- Electrical and Computer Engineering Department, The Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Duque
- Institute of Neuroscience of Castilla Y León, University of Salamanca, Salamanca, Spain
| | - Pingbo Yin
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| | | | - Mounya Elhilali
- Electrical and Computer Engineering Department, The Johns Hopkins University, Baltimore, MD, USA
| | - Shihab Shamma
- Electrical and Computer Engineering Department, University of Maryland, College Park, MD, USA.
- Institute for Systems Research, University of Maryland, College Park, MD, USA.
- Départment d'étude Cognitives, École Normale Supérieure-PSL, Paris, France.
| |
Collapse
|
2
|
Joshi N, Ng Y, Thakkar K, Duque D, Yin P, Fritz J, Elhilali M, Shamma S. Temporal Coherence Shapes Cortical Responses to Speech Mixtures in a Ferret Cocktail Party. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.595171. [PMID: 38915590 PMCID: PMC11195067 DOI: 10.1101/2024.05.21.595171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Segregation of complex sounds such as speech, music and animal vocalizations as they simultaneously emanate from multiple sources (referred to as the "cocktail party problem") is a remarkable ability that is common in humans and animals alike. The neural underpinnings of this process have been extensively studied behaviorally and physiologically in non-human animals primarily with simplified sounds (tones and noise sequences). In humans, segregation experiments utilizing more complex speech mixtures are common; but physiological experiments have relied on EEG/MEG/ECoG recordings that sample activity from thousands of neurons, often obscuring the detailed processes that give rise to the observed segregation. The present study combines the insights from animal single-unit physiology with segregation of speech-like mixtures. Ferrets were trained to attend to a female voice and detect a target word, both in presence or absence of a concurrent, equally salient male voice. Single neuron recordings were obtained from primary and secondary ferret auditory cortical fields, as well as frontal cortex. During task performance, representation of the female words became more enhanced relative to those of the (distractor) male in all cortical regions, especially in the higher auditory cortical field. Analysis of the temporal and spectral response characteristics during task performance reveals how speech segregation gradually emerges in the auditory cortex. A computational model evaluated on the same voice mixtures replicates and extends these results to different attentional targets (attention to female or male voices). These findings are consistent with the temporal coherence theory whereby attention to a target voice anchors neural activity in cortical networks hence binding together channels that are coherently temporally-modulated with the target, and ultimately forming a common auditory stream.
Collapse
Affiliation(s)
- Neha Joshi
- Electrical and Computer Engineering Department, University of Maryland College Park, MD
| | - Yu Ng
- Electrical and Computer Engineering Department, University of Maryland College Park, MD
| | - Karran Thakkar
- Electrical and Computer Engineering Department, The Johns Hopkins University, MD
| | - Daniel Duque
- Institute of Neuroscience of Castilla Y León, University of Salamanca
| | - Pingbo Yin
- Institute for Systems Research, University of Maryland College Park, MD
| | | | - Mounya Elhilali
- Electrical and Computer Engineering Department, The Johns Hopkins University, MD
| | - Shihab Shamma
- Electrical and Computer Engineering Department, University of Maryland College Park, MD
- Institute for Systems Research, University of Maryland College Park, MD
- Départment d'étude cognitives, école normale supérieure, PSL, Paris
| |
Collapse
|
3
|
Noyce AL, Varghese L, Mathias SR, Shinn-Cunningham BG. Perceptual organization and task demands jointly shape auditory working memory capacity. JASA EXPRESS LETTERS 2024; 4:034402. [PMID: 38526127 PMCID: PMC10966505 DOI: 10.1121/10.0025392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/08/2024] [Indexed: 03/26/2024]
Abstract
Listeners performed two different tasks in which they remembered short sequences comprising either complex tones (generally heard as one melody) or everyday sounds (generally heard as separate objects). In one, listeners judged whether a probe item had been present in the preceding sequence. In the other, they judged whether a second sequence of the same items was identical in order to the preceding sequence. Performance on the first task was higher for everyday sounds; performance on the second was higher for complex tones. Perceptual organization strongly shapes listeners' memory for sounds, with implications for real-world communication.
Collapse
Affiliation(s)
- Abigail L Noyce
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Leonard Varghese
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| | - Samuel R Mathias
- Department of Psychiatry, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts 02115, , , ,
| | | |
Collapse
|
4
|
Berthomieu G, Koehl V, Paquier M. Loudness constancy for noise and speech: How instructions and source information affect loudness of distant sounds. Atten Percept Psychophys 2023; 85:2774-2796. [PMID: 37466907 DOI: 10.3758/s13414-023-02719-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/20/2023] [Indexed: 07/20/2023]
Abstract
The physical properties of a sound evolve when traveling away from its source. As an example, the sound pressure level at the listener's ears will vary according to their respective distance and azimuth. However, several studies have reported loudness to remain constant when varying the distance between the source and the listener. This loudness constancy has been reported to occur when the listener focused attention on the sound as emitted by the source (namely the distal stimulus). Instead, the listener can focus on the sound as reaching the ears (namely the proximal stimulus). The instructions given to the listener when assessing loudness can drive focus toward the proximal or distal stimulus. However, focusing on the distal stimulus requires to have sufficient information about the sound source, which could be provided by either the environment or by the stimulus itself. The present study gathers three experiments designed to assess loudness when driving listeners' focus toward the proximal or distal stimuli. Listeners were provided with different quality and quantity of information about the source depending on the environment (visible or hidden sources, free field or reverberant rooms) and on the stimulus itself (noise or speech). The results show that listeners reported constant loudness when asked to focus on the distal stimulus only, provided enough information about the source was available. These results highlight that loudness relies on the way the listener focuses on the stimuli and emphasize the importance of the instructions that are given in loudness studies.
Collapse
Affiliation(s)
| | - Vincent Koehl
- Univ Brest, Lab-STICC, CNRS, UMR 6285, F-29200, Brest, France
| | - Mathieu Paquier
- Univ Brest, Lab-STICC, CNRS, UMR 6285, F-29200, Brest, France
| |
Collapse
|
5
|
Brown JA, Bidelman GM. Attention, Musicality, and Familiarity Shape Cortical Speech Tracking at the Musical Cocktail Party. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.28.562773. [PMID: 37961204 PMCID: PMC10634879 DOI: 10.1101/2023.10.28.562773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The "cocktail party problem" challenges our ability to understand speech in noisy environments, which often include background music. Here, we explored the role of background music in speech-in-noise listening. Participants listened to an audiobook in familiar and unfamiliar music while tracking keywords in either speech or song lyrics. We used EEG to measure neural tracking of the audiobook. When speech was masked by music, the modeled peak latency at 50 ms (P1TRF) was prolonged compared to unmasked. Additionally, P1TRF amplitude was larger in unfamiliar background music, suggesting improved speech tracking. We observed prolonged latencies at 100 ms (N1TRF) when speech was not the attended stimulus, though only in less musical listeners. Our results suggest early neural representations of speech are enhanced with both attention and concurrent unfamiliar music, indicating familiar music is more distracting. One's ability to perceptually filter "musical noise" at the cocktail party depends on objective musical abilities.
Collapse
Affiliation(s)
- Jane A. Brown
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
| |
Collapse
|
6
|
Gustafson SJ, Nelson L, Silcox JW. Effect of Auditory Distractors on Speech Recognition and Listening Effort. Ear Hear 2023; 44:1121-1132. [PMID: 36935395 PMCID: PMC10440215 DOI: 10.1097/aud.0000000000001356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
OBJECTIVES Everyday listening environments are filled with competing noise and distractors. Although significant research has examined the effect of competing noise on speech recognition and listening effort, little is understood about the effect of distraction. The framework for understanding effortful listening recognizes the importance of attention-related processes in speech recognition and listening effort; however, it underspecifies the role that they play, particularly with respect to distraction. The load theory of attention predicts that resources will be automatically allocated to processing a distractor, but only if perceptual load in the listening task is low enough. If perceptual load is high (i.e., listening in noise), then resources that would otherwise be allocated to processing a distractor are used to overcome the increased perceptual load and are unavailable for distractor processing. Although there is ample evidence for this theory in the visual domain, there has been little research investigating how the load theory of attention may apply to speech processing. In this study, we sought to measure the effect of distractors on speech recognition and listening effort and to evaluate whether the load theory of attention can be used to understand a listener's resource allocation in the presence of distractors. DESIGN Fifteen adult listeners participated in a monosyllabic words repetition task. Test stimuli were presented in quiet or in competing speech (+5 dB signal-to-noise ratio) and in distractor or no distractor conditions. In conditions with distractors, auditory distractors were presented before the target words on 24% of the trials in quiet and in noise. Percent-correct was recorded as speech recognition, and verbal response time (VRT) was recorded as a measure of listening effort. RESULTS A significant interaction was present for speech recognition, showing reduced speech recognition when distractors were presented in the quiet condition but no effect of distractors when noise was present. VRTs were significantly longer when distractors were present, regardless of listening condition. CONCLUSIONS Consistent with the load theory of attention, distractors significantly reduced speech recognition in the low-perceptual load condition (i.e., listening in quiet) but did not impact speech recognition scores in conditions of high perceptual load (i.e., listening in noise). The increases in VRTs in the presence of distractors in both low- and high-perceptual load conditions (i.e., quiet and noise) suggest that the load theory of attention may not apply to listening effort. However, the large effect of distractors on VRT in both conditions is consistent with the previous work demonstrating that distraction-related shifts of attention can delay processing of the target task. These findings also fit within the framework for understanding effortful listening, which proposes that involuntary attentional shifts result in a depletion of cognitive resources, leaving less resources readily available to process the signal of interest; resulting in increased listening effort (i.e., elongated VRT).
Collapse
Affiliation(s)
- Samantha J Gustafson
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, Utah
- These authors contributed equally to this work
| | - Loren Nelson
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, Utah
- These authors contributed equally to this work
| | - Jack W Silcox
- Department of Psychology, University of Utah, Salt Lake City, Utah
| |
Collapse
|
7
|
Roberts B, Haywood NR. Asymmetric effects of sudden changes in timbre on auditory stream segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:363-378. [PMID: 37462404 DOI: 10.1121/10.0020172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/28/2023] [Indexed: 07/21/2023]
Abstract
Two experiments explored the effects of abrupt transitions in timbral properties [amplitude modulation (AM), pure tones vs narrow-band noises, and attack/decay envelope] on streaming. Listeners reported continuously the number of streams heard during 18-s-long alternating low- and high-frequency (LHL-) sequences (frequency separation: 2-6 semitones) that underwent a coherent transition at 6 s or remained unchanged. In experiment 1, triplets comprised unmodulated pure tones or 100%-depth AM was created using narrowly spaced tone pairs (dyads: 30- or 50-Hz modulation). In experiment 2, triplets comprised narrow-band noises, dyads, or pure tones with quasi-trapezoidal envelopes (10/80/10 ms), fast attacks and slow decays (10/90 ms), or vice versa (90/10 ms). Abrupt transitions led to direction-dependent changes in stream segregation. Transitions from modulated to unmodulated (or slower-modulated) tones, from noise bands to pure tones, or from slow- to fast-attack tones typically caused substantial loss of segregation (resetting), whereas transitions in the opposite direction mostly caused less or no resetting. Furthermore, for the smallest frequency separation, transitions in the latter direction usually led to increased segregation (overshoot). Overall, the results are reminiscent of the perceptual asymmetries found in auditory search for targets with or without a salient additional feature (or greater activation of that feature).
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Nicholas R Haywood
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, United Kingdom
| |
Collapse
|
8
|
Higgins NC, Scurry AN, Jiang F, Little DF, Alain C, Elhilali M, Snyder JS. Adaptation in the sensory cortex drives bistable switching during auditory stream segregation. Neurosci Conscious 2023; 2023:niac019. [PMID: 36751309 PMCID: PMC9899071 DOI: 10.1093/nc/niac019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/17/2022] [Accepted: 12/26/2022] [Indexed: 02/06/2023] Open
Abstract
Current theories of perception emphasize the role of neural adaptation, inhibitory competition, and noise as key components that lead to switches in perception. Supporting evidence comes from neurophysiological findings of specific neural signatures in modality-specific and supramodal brain areas that appear to be critical to switches in perception. We used functional magnetic resonance imaging to study brain activity around the time of switches in perception while participants listened to a bistable auditory stream segregation stimulus, which can be heard as one integrated stream of tones or two segregated streams of tones. The auditory thalamus showed more activity around the time of a switch from segregated to integrated compared to time periods of stable perception of integrated; in contrast, the rostral anterior cingulate cortex and the inferior parietal lobule showed more activity around the time of a switch from integrated to segregated compared to time periods of stable perception of segregated streams, consistent with prior findings of asymmetries in brain activity depending on the switch direction. In sound-responsive areas in the auditory cortex, neural activity increased in strength preceding switches in perception and declined in strength over time following switches in perception. Such dynamics in the auditory cortex are consistent with the role of adaptation proposed by computational models of visual and auditory bistable switching, whereby the strength of neural activity decreases following a switch in perception, which eventually destabilizes the current percept enough to lead to a switch to an alternative percept.
Collapse
Affiliation(s)
- Nathan C Higgins
- Department of Communication Sciences and Disorders, University of South Florida, 4202 E. Fowler Avenue, PCD1017, Tampa, FL 33620, USA
| | - Alexandra N Scurry
- Department of Psychology, University of Nevada, 1664 N. Virginia Street Mail Stop 0296, Reno, NV 89557, USA
| | - Fang Jiang
- Department of Psychology, University of Nevada, 1664 N. Virginia Street Mail Stop 0296, Reno, NV 89557, USA
| | - David F Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Claude Alain
- Rotman Research Institute, Baycrest Health Sciences, 3560 Bathurst Street, Toronto, ON M6A 2E1, Canada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Joel S Snyder
- Department of Psychology, University of Nevada, 4505 Maryland Parkway Mail Stop 5030, Las Vegas, NV 89154, USA
| |
Collapse
|
9
|
Wang H, Chen R, Yan Y, McGettigan C, Rosen S, Adank P. Perceptual Learning of Noise-Vocoded Speech Under Divided Attention. Trends Hear 2023; 27:23312165231192297. [PMID: 37547940 PMCID: PMC10408355 DOI: 10.1177/23312165231192297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Speech perception performance for degraded speech can improve with practice or exposure. Such perceptual learning is thought to be reliant on attention and theoretical accounts like the predictive coding framework suggest a key role for attention in supporting learning. However, it is unclear whether speech perceptual learning requires undivided attention. We evaluated the role of divided attention in speech perceptual learning in two online experiments (N = 336). Experiment 1 tested the reliance of perceptual learning on undivided attention. Participants completed a speech recognition task where they repeated forty noise-vocoded sentences in a between-group design. Participants performed the speech task alone or concurrently with a domain-general visual task (dual task) at one of three difficulty levels. We observed perceptual learning under divided attention for all four groups, moderated by dual-task difficulty. Listeners in easy and intermediate visual conditions improved as much as the single-task group. Those who completed the most challenging visual task showed faster learning and achieved similar ending performance compared to the single-task group. Experiment 2 tested whether learning relies on domain-specific or domain-general processes. Participants completed a single speech task or performed this task together with a dual task aiming to recruit domain-specific (lexical or phonological), or domain-general (visual) processes. All secondary task conditions produced patterns and amount of learning comparable to the single speech task. Our results demonstrate that the impact of divided attention on perceptual learning is not strictly dependent on domain-general or domain-specific processes and speech perceptual learning persists under divided attention.
Collapse
Affiliation(s)
- Han Wang
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Rongru Chen
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Yu Yan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Stuart Rosen
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Patti Adank
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
10
|
Thomassen S, Hartung K, Einhäuser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2758. [PMID: 36456271 DOI: 10.1121/10.0015054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/17/2022] [Indexed: 06/17/2023]
Abstract
Sequential auditory scene analysis (ASA) is often studied using sequences of two alternating tones, such as ABAB or ABA_, with "_" denoting a silent gap, and "A" and "B" sine tones differing in frequency (nominally low and high). Many studies implicitly assume that the specific arrangement (ABAB vs ABA_, as well as low-high-low vs high-low-high within ABA_) plays a negligible role, such that decisions about the tone pattern can be governed by other considerations. To explicitly test this assumption, a systematic comparison of different tone patterns for two-tone sequences was performed in three different experiments. Participants were asked to report whether they perceived the sequences as originating from a single sound source (integrated) or from two interleaved sources (segregated). Results indicate that core findings of sequential ASA, such as an effect of frequency separation on the proportion of integrated and segregated percepts, are similar across the different patterns during prolonged listening. However, at sequence onset, the integrated percept was more likely to be reported by the participants in ABA_low-high-low than in ABA_high-low-high sequences. This asymmetry is important for models of sequential ASA, since the formation of percepts at onset is an integral part of understanding how auditory interpretations build up.
Collapse
Affiliation(s)
- Sabine Thomassen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Kevin Hartung
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Alexandra Bendixen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| |
Collapse
|
11
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Winkler I. Do we parse the background into separate streams in the cocktail party? Front Hum Neurosci 2022; 16:952557. [DOI: 10.3389/fnhum.2022.952557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the cocktail party situation, people with normal hearing usually follow a single speaker among multiple concurrent ones. However, there is no agreement in the literature as to whether the background is segregated into multiple streams/speakers. The current study varied the number of concurrent speech streams and investigated target detection and memory for the contents of a target stream as well as the processing of distractors. A male-voiced target stream was either presented alone (single-speech), together with one male-voiced distractor (one-distractor), or a male- and a female-voiced distractor (two-distractor). Behavioral measures of target detection and content tracking performance as well as target- and distractor detection related event-related brain potentials (ERPs) were assessed. We found that the N2 amplitude decreased whereas the P3 amplitude increased from the single-speech to the concurrent speech streams conditions. Importantly, the behavioral effect of distractors differed between the conditions with one vs. two distractor speech streams and the non-zero voltages in the N2 time window for distractor numerals and in the P3 time window for syntactic violations appearing in the non-target speech stream significantly differed between the one- and two-distractor conditions for the same (male) speaker. These results support the notion that the two background speech streams are segregated, as they show that distractors and syntactic violations appearing in the non-target streams are processed even when two speech non-target speech streams are delivered together with the target stream.
Collapse
|
12
|
Sauvé SA, Marozeau J, Rich Zendel B. The effects of aging and musicianship on the use of auditory streaming cues. PLoS One 2022; 17:e0274631. [PMID: 36137151 PMCID: PMC9498935 DOI: 10.1371/journal.pone.0274631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 08/31/2022] [Indexed: 11/22/2022] Open
Abstract
Auditory stream segregation, or separating sounds into their respective sources and tracking them over time, is a fundamental auditory ability. Previous research has separately explored the impacts of aging and musicianship on the ability to separate and follow auditory streams. The current study evaluated the simultaneous effects of age and musicianship on auditory streaming induced by three physical features: intensity, spectral envelope and temporal envelope. In the first study, older and younger musicians and non-musicians with normal hearing identified deviants in a four-note melody interleaved with distractors that were more or less similar to the melody in terms of intensity, spectral envelope and temporal envelope. In the second study, older and younger musicians and non-musicians participated in a dissimilarity rating paradigm with pairs of melodies that differed along the same three features. Results suggested that auditory streaming skills are maintained in older adults but that older adults rely on intensity more than younger adults while musicianship is associated with increased sensitivity to spectral and temporal envelope, acoustic features that are typically less effective for stream segregation, particularly in older adults.
Collapse
Affiliation(s)
- Sarah A. Sauvé
- Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
| | - Jeremy Marozeau
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Benjamin Rich Zendel
- Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
| |
Collapse
|
13
|
Matz AF, Nie Y, Wheeler HJ. Auditory stream segregation of amplitude-modulated narrowband noise in cochlear implant users and individuals with normal hearing. Front Psychol 2022; 13:927854. [PMID: 36118488 PMCID: PMC9479457 DOI: 10.3389/fpsyg.2022.927854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open
Abstract
Voluntary stream segregation was investigated in cochlear implant (CI) users and normal-hearing (NH) listeners using a segregation-promoting objective approach which evaluated the role of spectral and amplitude-modulation (AM) rate separations on stream segregation and its build-up. Sequences of 9 or 3 pairs of A and B narrowband noise (NBN) bursts were presented which differed in either center frequency of the noise band, the AM-rate, or both. In some sequences (delayed sequences), the last B burst was delayed by 35 ms from their otherwise-steady temporal position. In the other sequences (no-delay sequences), the last B bursts were temporally advanced from 0 to 10 ms. A single interval yes/no procedure was utilized to measure participants’ sensitivity (d′) in identifying delayed vs. no-delay sequences. A higher d′ value showed the higher ability to segregate the A and B subsequences. For NH listeners, performance improved with each spectral separation. However, for CI users, performance was only significantly better for the condition with the largest spectral separation. Additionally, performance was significantly poorer for the largest AM-rate separation than for the condition with no AM-rate separation for both groups. The significant effect of sequence duration in both groups indicated that listeners made more improvement with lengthening the duration of stimulus sequences, supporting the build-up effect. The results of this study suggest that CI users are less able than NH listeners to segregate NBN bursts into different auditory streams when they are moderately separated in the spectral domain. Contrary to our hypothesis, our results indicate that AM-rate separation may interfere with the segregation of streams of NBN. Additionally, our results add evidence to the literature that CI users build up stream segregation at a rate comparable to NH listeners, when the inter-stream spectral separations are adequately large.
Collapse
Affiliation(s)
- Alexandria F. Matz
- Department of Otolaryngology, Eastern Virginia Medical School, Norfolk, VA, United States
| | - Yingjiu Nie
- Department of Communication Sciences and Disorders, James Madison University, Harrisonburg, VA, United States
- *Correspondence: Yingjiu Nie,
| | - Harley J. Wheeler
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Twin Cities, Minneapolis, MN, United States
| |
Collapse
|
14
|
Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception. Curr Biol 2022; 32:3971-3986.e4. [PMID: 35973430 DOI: 10.1016/j.cub.2022.07.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/08/2022] [Accepted: 07/19/2022] [Indexed: 11/20/2022]
Abstract
How the human auditory cortex represents spatially separated simultaneous talkers and how talkers' locations and voices modulate the neural representations of attended and unattended speech are unclear. Here, we measured the neural responses from electrodes implanted in neurosurgical patients as they performed single-talker and multi-talker speech perception tasks. We found that spatial separation between talkers caused a preferential encoding of the contralateral speech in Heschl's gyrus (HG), planum temporale (PT), and superior temporal gyrus (STG). Location and spectrotemporal features were encoded in different aspects of the neural response. Specifically, the talker's location changed the mean response level, whereas the talker's spectrotemporal features altered the variation of response around response's baseline. These components were differentially modulated by the attended talker's voice or location, which improved the population decoding of attended speech features. Attentional modulation due to the talker's voice only appeared in the auditory areas with longer latencies, but attentional modulation due to location was present throughout. Our results show that spatial multi-talker speech perception relies upon a separable pre-attentive neural representation, which could be further tuned by top-down attention to the location and voice of the talker.
Collapse
|
15
|
Abstract
Hearing in noise is a core problem in audition, and a challenge for hearing-impaired listeners, yet the underlying mechanisms are poorly understood. We explored whether harmonic frequency relations, a signature property of many communication sounds, aid hearing in noise for normal hearing listeners. We measured detection thresholds in noise for tones and speech synthesized to have harmonic or inharmonic spectra. Harmonic signals were consistently easier to detect than otherwise identical inharmonic signals. Harmonicity also improved discrimination of sounds in noise. The largest benefits were observed for two-note up-down "pitch" discrimination and melodic contour discrimination, both of which could be performed equally well with harmonic and inharmonic tones in quiet, but which showed large harmonic advantages in noise. The results show that harmonicity facilitates hearing in noise, plausibly by providing a noise-robust pitch cue that aids detection and discrimination.
Collapse
|
16
|
Cai H, Dent ML. Dimensionally Specific Attention Capture in Birds Performing Auditory Streaming Task. J Assoc Res Otolaryngol 2022; 23:241-252. [PMID: 34988866 DOI: 10.1007/s10162-021-00825-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 11/17/2021] [Indexed: 11/25/2022] Open
Abstract
Previous studies in budgerigars (Melopsittacus undulatus) have indicated that they experience attention capture in a qualitatively similar way to humans. Here, we apply a similar objective auditory streaming paradigm, using modified budgerigar vocalizations instead of ABAB-… patterned pure tones, in the sound sequences. The birds were trained to respond to deviants in the target stream while ignoring the distractors in the background stream. The background distractor could vary among five different categories and two different sequential positions, while the target deviants could randomly appear at five different sequential positions and vary among two different categories. We found that unpredictable background distractors could deteriorate birds' sensitivity to the target deviants. Compared to conditions where the background distractor appeared right before the target deviant, the attention capture effect decayed in conditions when the background distractor appeared earlier. In contrast to results from the same paradigm using pure tones, the results here are evidence for a faster recovery from attention capture using modified vocalization segments. We found that the temporally modulated background distractor captured birds' attention more and deteriorated birds' performance more than other categories of background distractor, as the temporally modulated target deviant enabled the birds to focus their attention toward the temporal modulation dimension. However, different from humans, birds have lower tolerances for suppressing the distractors from the same feature dimensions as the targets, which is evidenced by higher false alarm rates for the temporally modulated distractor than other distractors from different feature dimensions.
Collapse
Affiliation(s)
- Huaizhen Cai
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Micheal L Dent
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, NY, USA.
| |
Collapse
|
17
|
Higgins NC, Monjaras AG, Yerkes BD, Little DF, Nave-Blodgett JE, Elhilali M, Snyder JS. Resetting of Auditory and Visual Segregation Occurs After Transient Stimuli of the Same Modality. Front Psychol 2021; 12:720131. [PMID: 34621219 PMCID: PMC8490814 DOI: 10.3389/fpsyg.2021.720131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 08/16/2021] [Indexed: 12/03/2022] Open
Abstract
In the presence of a continually changing sensory environment, maintaining stable but flexible awareness is paramount, and requires continual organization of information. Determining which stimulus features belong together, and which are separate is therefore one of the primary tasks of the sensory systems. Unknown is whether there is a global or sensory-specific mechanism that regulates the final perceptual outcome of this streaming process. To test the extent of modality independence in perceptual control, an auditory streaming experiment, and a visual moving-plaid experiment were performed. Both were designed to evoke alternating perception of an integrated or segregated percept. In both experiments, transient auditory and visual distractor stimuli were presented in separate blocks, such that the distractors did not overlap in frequency or space with the streaming or plaid stimuli, respectively, thus preventing peripheral interference. When a distractor was presented in the opposite modality as the bistable stimulus (visual distractors during auditory streaming or auditory distractors during visual streaming), the probability of percept switching was not significantly different than when no distractor was presented. Conversely, significant differences in switch probability were observed following within-modality distractors, but only when the pre-distractor percept was segregated. Due to the modality-specificity of the distractor-induced resetting, the results suggest that conscious perception is at least partially controlled by modality-specific processing. The fact that the distractors did not have peripheral overlap with the bistable stimuli indicates that the perceptual reset is due to interference at a locus in which stimuli of different frequencies and spatial locations are integrated.
Collapse
Affiliation(s)
- Nathan C Higgins
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| | - Ambar G Monjaras
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| | - Breanne D Yerkes
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| | - David F Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | | | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Joel S Snyder
- Department of Psychology, University of Nevada Las Vegas, Las Vegas, NV, United States
| |
Collapse
|
18
|
Hausfeld L, Disbergen NR, Valente G, Zatorre RJ, Formisano E. Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2021; 15:635937. [PMID: 34630007 PMCID: PMC8498193 DOI: 10.3389/fnins.2021.635937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Niels R Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Robert J Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, Netherlands
- Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, Netherlands
| |
Collapse
|
19
|
Rajasingam SL, Summers RJ, Roberts B. The dynamics of auditory stream segregation: Effects of sudden changes in frequency, level, or modulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3769. [PMID: 34241493 DOI: 10.1121/10.0005049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 05/03/2021] [Indexed: 06/13/2023]
Abstract
Three experiments explored the effects of abrupt changes in stimulus properties on streaming dynamics. Listeners monitored 20-s-long low- and high-frequency (LHL-) tone sequences and reported the number of streams heard throughout. Experiments 1 and 2 used pure tones and examined the effects of changing triplet base frequency and level, respectively. Abrupt changes in base frequency (±3-12 semitones) caused significant magnitude-related falls in segregation (resetting), regardless of transition direction, but an asymmetry occurred for changes in level (±12 dB). Rising-level transitions usually decreased segregation significantly, whereas falling-level transitions had little or no effect. Experiment 3 used pure tones (unmodulated) and narrowly spaced (±25 Hz) tone pairs (dyads); the two evoke similar excitation patterns, but dyads are strongly modulated with a distinctive timbre. Dyad-only sequences induced a strongly segregated percept, limiting scope for further build-up. Alternation between groups of pure tones and dyads produced large, asymmetric changes in streaming. Dyad-to-pure transitions caused substantial resetting, but pure-to-dyad transitions sometimes elicited even greater segregation than for the corresponding interval in dyad-only sequences (overshoot). The results indicate that abrupt changes in timbre can strongly affect the likelihood of stream segregation without introducing significant peripheral-channeling cues. These asymmetric effects of transition direction are reminiscent of subtractive adaptation in vision.
Collapse
Affiliation(s)
- Saima L Rajasingam
- Department of Vision and Hearing Sciences, Anglia Ruskin University, Cambridge CB1 1PT, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
20
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
21
|
Roswandowitz C, Swanborough H, Frühholz S. Categorizing human vocal signals depends on an integrated auditory-frontal cortical network. Hum Brain Mapp 2021; 42:1503-1517. [PMID: 33615612 PMCID: PMC7927295 DOI: 10.1002/hbm.25309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/20/2020] [Accepted: 11/25/2020] [Indexed: 11/30/2022] Open
Abstract
Voice signals are relevant for auditory communication and suggested to be processed in dedicated auditory cortex (AC) regions. While recent reports highlighted an additional role of the inferior frontal cortex (IFC), a detailed description of the integrated functioning of the AC-IFC network and its task relevance for voice processing is missing. Using neuroimaging, we tested sound categorization while human participants either focused on the higher-order vocal-sound dimension (voice task) or feature-based intensity dimension (loudness task) while listening to the same sound material. We found differential involvements of the AC and IFC depending on the task performed and whether the voice dimension was of task relevance or not. First, when comparing neural vocal-sound processing of our task-based with previously reported passive listening designs we observed highly similar cortical activations in the AC and IFC. Second, during task-based vocal-sound processing we observed voice-sensitive responses in the AC and IFC whereas intensity processing was restricted to distinct AC regions. Third, the IFC flexibly adapted to the vocal-sounds' task relevance, being only active when the voice dimension was task relevant. Forth and finally, connectivity modeling revealed that vocal signals independent of their task relevance provided significant input to bilateral AC. However, only when attention was on the voice dimension, we found significant modulations of auditory-frontal connections. Our findings suggest an integrated auditory-frontal network to be essential for behaviorally relevant vocal-sounds processing. The IFC seems to be an important hub of the extended voice network when representing higher-order vocal objects and guiding goal-directed behavior.
Collapse
Affiliation(s)
- Claudia Roswandowitz
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
| | - Huw Swanborough
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
| | - Sascha Frühholz
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
- Center for Integrative Human Physiology (ZIHP)University of ZurichZurichSwitzerland
| |
Collapse
|
22
|
Hausfeld L, Shiell M, Formisano E, Riecke L. Cortical processing of distracting speech in noisy auditory scenes depends on perceptual demand. Neuroimage 2020; 228:117670. [PMID: 33359352 DOI: 10.1016/j.neuroimage.2020.117670] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 11/15/2022] Open
Abstract
Selective attention is essential for the processing of multi-speaker auditory scenes because they require the perceptual segregation of the relevant speech ("target") from irrelevant speech ("distractors"). For simple sounds, it has been suggested that the processing of multiple distractor sounds depends on bottom-up factors affecting task performance. However, it remains unclear whether such dependency applies to naturalistic multi-speaker auditory scenes. In this study, we tested the hypothesis that increased perceptual demand (the processing requirement posed by the scene to separate the target speech) reduces the cortical processing of distractor speech thus decreasing their perceptual segregation. Human participants were presented with auditory scenes including three speakers and asked to selectively attend to one speaker while their EEG was acquired. The perceptual demand of this selective listening task was varied by introducing an auditory cue (interaural time differences, ITDs) for segregating the target from the distractor speakers, while acoustic differences between the distractors were matched in ITD and loudness. We obtained a quantitative measure of the cortical segregation of distractor speakers by assessing the difference in how accurately speech-envelope following EEG responses could be predicted by models of averaged distractor speech versus models of individual distractor speech. In agreement with our hypothesis, results show that interaural segregation cues led to improved behavioral word-recognition performance and stronger cortical segregation of the distractor speakers. The neural effect was strongest in the δ-band and at early delays (0 - 200 ms). Our results indicate that during low perceptual demand, the human cortex represents individual distractor speech signals as more segregated. This suggests that, in addition to purely acoustical properties, the cortical processing of distractor speakers depends on factors like perceptual demand.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands.
| | - Martha Shiell
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands; Maastricht Centre for Systems Biology, 6200MD Maastricht, The Netherlands
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| |
Collapse
|
23
|
Côté V, Lalancette È, Knoth IS, Côté L, Agbogba K, Vannasing P, Major P, Barlaam F, Michaud J, Lippé S. Distinct patterns of repetition suppression in Fragile X syndrome, down syndrome, tuberous sclerosis complex and mutations in SYNGAP1. Brain Res 2020; 1751:147205. [PMID: 33189692 DOI: 10.1016/j.brainres.2020.147205] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 10/31/2020] [Accepted: 11/08/2020] [Indexed: 12/29/2022]
Abstract
Sensory processing is the gateway to information processing and more complex processes such as learning. Alterations in sensory processing is a common phenotype of many genetic syndromes associated with intellectual disability (ID). It is currently unknown whether sensory processing alterations converge or diverge on brain responses between syndromes. Here, we compare for the first time four genetic conditions with ID using the same basic sensory learning paradigm. One hundred and five participants, aged between 3 and 30 years old, composing four clinical ID groups and one control group, were recruited: Fragile X syndrome (FXS; n = 14), tuberous sclerosis complex (TSC; n = 9), Down syndrome (DS; n = 19), SYNGAP1 mutations (n = 8) and Neurotypical controls (NT; n = 55)). All groups included female and male participants. Brain responses were recorded using electroencephalography (EEG) during an audio-visual task that involved three repetitions of the pronunciation of the phoneme /a/. Event Related Potentials (ERP) were used to: 1) compare peak-to-peak amplitudes between groups, 2) evaluate the presence of repetition suppression within each group and 3) compare the relative repetition suppression between groups. Our results revealed larger overall amplitudes in FXS. A repetition suppression (RS) pattern was found in the NT group, FXS and DS, suggesting spared repetition suppression in a multimodal task in these two ID syndromes. Interestingly, FXS presented a stronger RS on one peak-to-peak value in comparison with the NT. The results of our study reveal the distinctiveness of ERP and RS brain responses in ID syndromes. Further studies should be conducted to understand the molecular mechanisms involved in these patterns of responses.
Collapse
Affiliation(s)
- Valérie Côté
- Psychology Departement, Université de Montréal, Pavillon Marie-Victorin, 90, Avenue Vincent d'Indy, Montréal, QC H2V 2S9, Canada; NED Laboratory, Office 5.2.43, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| | - Ève Lalancette
- Psychology Departement, Université de Montréal, Pavillon Marie-Victorin, 90, Avenue Vincent d'Indy, Montréal, QC H2V 2S9, Canada; NED Laboratory, Office 5.2.43, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada
| | - Inga S Knoth
- NED Laboratory, Office 5.2.43, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada
| | - Lucie Côté
- Neurology Program, CHU Sainte-Justine, Montréal, 3175 Chemin de la Côte-Sainte-Catherine, QC H3T 1C5, Canada.
| | - Kristian Agbogba
- NED Laboratory, Office 5.2.43, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| | - Phetsamone Vannasing
- Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| | - Philippe Major
- Neurology Program, CHU Sainte-Justine, Montréal, 3175 Chemin de la Côte-Sainte-Catherine, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| | - Fanny Barlaam
- NED Laboratory, Office 5.2.43, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada
| | - Jacques Michaud
- Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| | - Sarah Lippé
- Psychology Departement, Université de Montréal, Pavillon Marie-Victorin, 90, Avenue Vincent d'Indy, Montréal, QC H2V 2S9, Canada; NED Laboratory, Office 5.2.43, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Research Center UHC Sainte-Justine, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| |
Collapse
|
24
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 PMCID: PMC7500957 DOI: 10.1121/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 08/27/2020] [Accepted: 08/28/2020] [Indexed: 06/11/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
25
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 DOI: 10.5041466/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
26
|
Gustafson SJ, Grose J, Buss E. Perceptual organization and stability of auditory streaming for pure tones and /ba/ stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL159. [PMID: 32873027 PMCID: PMC7438158 DOI: 10.1121/10.0001744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/23/2020] [Accepted: 07/27/2020] [Indexed: 06/11/2023]
Abstract
The dynamics of auditory stream segregation were evaluated using repeating triplets composed of pure tones or the syllable /ba/. Stimuli differed in frequency (tones) or fundamental frequency (speech) by 4, 6, 8, or 10 semitones, and the standard frequency was either 250 Hz (tones and speech) or 400 Hz (tones). Twenty normal-hearing adults participated. For both tones and speech, a two-stream percept became more likely as frequency separation increased. Perceptual organization for speech tended to be more integrated and less stable compared to tones. Results suggest that prior data patterns observed with tones in this paradigm may generalize to speech stimuli.
Collapse
Affiliation(s)
- Samantha J Gustafson
- Department of Communication Sciences and Disorders, University of Utah, 390 South 1530 East, Salt Lake City, Utah 84112, USA
| | - John Grose
- Department of Otolaryngology-Head and Neck Surgery, University of North Carolina, 170, Manning Drive, Chapel Hill, North Carolina 27599, , ,
| | - Emily Buss
- Department of Otolaryngology-Head and Neck Surgery, University of North Carolina, 170, Manning Drive, Chapel Hill, North Carolina 27599, , ,
| |
Collapse
|
27
|
Gurariy G, Randall R, Greenberg AS. Manipulation of low-level features modulates grouping strength of auditory objects. PSYCHOLOGICAL RESEARCH 2020; 85:2256-2270. [PMID: 32691138 DOI: 10.1007/s00426-020-01391-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 07/10/2020] [Indexed: 11/29/2022]
Abstract
A central challenge of auditory processing involves the segregation, analysis, and integration of acoustic information into auditory perceptual objects for processing by higher order cognitive operations. This study explores the influence of low-level features on auditory object perception. Participants provided perceived musicality ratings in response to randomly generated pure tone sequences. Previous work has shown that music perception relies on the integration of discrete sounds into a holistic structure. Hence, high (versus low) ratings were viewed as indicative of strong (versus weak) object formation. Additionally, participants rated sequences in which random subsets of tones were manipulated along one of three low-level dimensions (timbre, amplitude, or fade-in) at one of three strengths (low, medium, or high). Our primary findings demonstrate how low-level acoustic features modulate the perception of auditory objects, as measured by changes in musicality ratings for manipulated sequences. Secondarily, we used principal component analysis to categorize participants into subgroups based on differential sensitivities to low-level auditory dimensions, thereby highlighting the importance of individual differences in auditory perception. Finally, we report asymmetries regarding the effects of low-level dimensions; specifically, the perceptual significance of timbre. Together, these data contribute to our understanding of how low-level auditory features modulate auditory object perception.
Collapse
Affiliation(s)
- Gennadiy Gurariy
- Department of Biomedical Engineering, Medical College of Wisconsin & Marquette University, Milwaukee, USA
| | - Richard Randall
- School of Music and Neuroscience Institute, Carnegie Mellon University, Pittsburgh, USA.
| | - Adam S Greenberg
- Department of Biomedical Engineering, Medical College of Wisconsin & Marquette University, Milwaukee, USA
| |
Collapse
|
28
|
May KR, Tomlinson BJ, Ma X, Roberts P, Walker BN. Spotlights and Soundscapes. ACM TRANSACTIONS ON ACCESSIBLE COMPUTING 2020. [DOI: 10.1145/3378576] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
For persons with visual impairment, forming cognitive maps of unfamiliar interior spaces can be challenging. Various technical developments have converged to make it feasible, without specialized equipment, to represent a variety of useful landmark objects via spatial audio, rather than solely dispensing route information. Although such systems could be key to facilitating cognitive map formation, high-density auditory environments must be crafted carefully to avoid overloading the listener. This article recounts a set of research exercises with potential users, in which the optimization of such systems was explored. In Experiment 1, a virtual reality environment was used to rapidly prototype and adjust the auditory environment in response to participant comments. In Experiment 2, three variants of the system were evaluated in terms of their effectiveness in a real-world building. This methodology revealed a variety of optimization approaches and recommendations for designing dense mixed-reality auditory environments aimed at supporting cognitive map formation by visually impaired persons.
Collapse
Affiliation(s)
| | | | - Xiaomeng Ma
- Georgia Institute of Technology, Atlanta, Georgia
| | | | | |
Collapse
|
29
|
Cai H, Dent ML. Attention capture in birds performing an auditory streaming task. PLoS One 2020; 15:e0235420. [PMID: 32589692 PMCID: PMC7319309 DOI: 10.1371/journal.pone.0235420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 06/15/2020] [Indexed: 11/19/2022] Open
Abstract
Numerous animal models have been used to investigate the neural mechanisms of auditory processing in complex acoustic environments, but it is unclear whether an animal’s auditory attention is functionally similar to a human’s in processing competing auditory scenes. Here we investigated the effects of attention capture in birds performing an objective auditory streaming paradigm. The classical ABAB… patterned pure tone sequences were modified and used for the task. We trained the birds to selectively attend to a target stream and only respond to the deviant appearing in the target stream, even though their attention may be captured by a deviant in the background stream. When no deviant appeared in the background stream, the birds experience the buildup of streaming process in a qualitatively similar way as they did in a subjective paradigm. Although the birds were trained to selectively attend to the target stream, they failed to avoid the involuntary attention switch caused by the background deviant, especially when the background deviant was sequentially unpredictable. Their global performance deteriorated more with increasingly salient background deviants, where the buildup process was reset by the background distractor. Moreover, sequential predictability of the background deviant facilitated the recovery of the buildup process after attention capture. This is the first study that addresses the perceptual consequences of the joint effects of top-down and bottom-up attention in behaving animals.
Collapse
Affiliation(s)
- Huaizhen Cai
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
| | - Micheal L. Dent
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
- * E-mail:
| |
Collapse
|
30
|
Schutz M, Gillard J. On the generalization of tones: A detailed exploration of non-speech auditory perception stimuli. Sci Rep 2020; 10:9520. [PMID: 32533008 PMCID: PMC7293323 DOI: 10.1038/s41598-020-63132-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 03/13/2020] [Indexed: 11/09/2022] Open
Abstract
The dynamic changes in natural sounds’ temporal structures convey important event-relevant information. However, prominent researchers have previously expressed concern that non-speech auditory perception research disproportionately uses simplistic stimuli lacking the temporal variation found in natural sounds. A growing body of work now demonstrates that some conclusions and models derived from experiments using simplistic tones fail to generalize, raising important questions about the types of stimuli used to assess the auditory system. To explore the issue empirically, we conducted a novel, large-scale survey of non-speech auditory perception research from four prominent journals. A detailed analysis of 1017 experiments from 443 articles reveals that 89% of stimuli employ amplitude envelopes lacking the dynamic variations characteristic of non-speech sounds heard outside the laboratory. Given differences in task outcomes and even the underlying perceptual strategies evoked by dynamic vs. invariant amplitude envelopes, this raises important questions of broad relevance to psychologists and neuroscientists alike. This lack of exploration of a property increasingly recognized as playing a crucial role in perception suggests future research using stimuli with time-varying amplitude envelopes holds significant potential for furthering our understanding of the auditory system’s basic processing capabilities.
Collapse
Affiliation(s)
- Michael Schutz
- School of the Arts, McMaster University, Hamilton, Canada. .,Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Canada.
| | - Jessica Gillard
- Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, Canada
| |
Collapse
|
31
|
Kim SG, Poeppel D, Overath T. Modulation change detection in human auditory cortex: Evidence for asymmetric, non-linear edge detection. Eur J Neurosci 2020; 52:2889-2904. [PMID: 32080939 DOI: 10.1111/ejn.14707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 01/18/2020] [Accepted: 02/10/2020] [Indexed: 11/28/2022]
Abstract
Changes in modulation rate are important cues for parsing acoustic signals, such as speech. We parametrically controlled modulation rate via the correlation coefficient (r) of amplitude spectra across fixed frequency channels between adjacent time frames: broadband modulation spectra are biased toward slow modulate rates with increasing r, and vice versa. By concatenating segments with different r, acoustic changes of various directions (e.g., changes from low to high correlation coefficients, that is, random-to-correlated or vice versa) and sizes (e.g., changes from low to high or from medium to high correlation coefficients) can be obtained. Participants listened to sound blocks and detected changes in correlation while MEG was recorded. Evoked responses to changes in correlation demonstrated (a) an asymmetric representation of change direction: random-to-correlated changes produced a prominent evoked field around 180 ms, while correlated-to-random changes evoked an earlier response with peaks at around 70 and 120 ms, whose topographies resemble those of the canonical P50m and N100m responses, respectively, and (b) a highly non-linear representation of correlation structure, whereby even small changes involving segments with a high correlation coefficient were much more salient than relatively large changes that did not involve segments with high correlation coefficients. Induced responses revealed phase tracking in the delta and theta frequency bands for the high correlation stimuli. The results confirm a high sensitivity for low modulation rates in human auditory cortex, both in terms of their representation and their segregation from other modulation rates.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA.,Center for Neural Science, New York University, New York, NY, USA.,Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA.,Duke Institute for Brain Sciences, Duke University, Durham, NC, USA.,Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
| |
Collapse
|
32
|
Neural correlates of perceptual switching while listening to bistable auditory streaming stimuli. Neuroimage 2020; 204:116220. [DOI: 10.1016/j.neuroimage.2019.116220] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 08/19/2019] [Accepted: 09/19/2019] [Indexed: 11/15/2022] Open
|
33
|
Illusory sound texture reveals multi-second statistical completion in auditory scene analysis. Nat Commun 2019; 10:5096. [PMID: 31704913 PMCID: PMC6841952 DOI: 10.1038/s41467-019-12893-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 10/03/2019] [Indexed: 12/27/2022] Open
Abstract
Sound sources in the world are experienced as stable even when intermittently obscured, implying perceptual completion mechanisms that “fill in” missing sensory information. We demonstrate a filling-in phenomenon in which the brain extrapolates the statistics of background sounds (textures) over periods of several seconds when they are interrupted by another sound, producing vivid percepts of illusory texture. The effect differs from previously described completion effects in that 1) the extrapolated sound must be defined statistically given the stochastic nature of texture, and 2) the effect lasts much longer, enabling introspection and facilitating assessment of the underlying representation. Illusory texture biases subsequent texture statistic estimates indistinguishably from actual texture, suggesting that it is represented similarly to actual texture. The illusion appears to represent an inference about whether the background is likely to continue during concurrent sounds, providing a stable statistical representation of the ongoing environment despite unstable sensory evidence. Auditory textures are sounds defined by a particular statistical distribution, e.g. as is produced by rain, or a swarm of insects. Here, the authors describe a striking perceptual illusion in which sound textures are heard to continue, even though they have in fact been replaced by white noise.
Collapse
|
34
|
Domingo Y, Holmes E, Macpherson E, Johnsrude IS. Using spatial release from masking to estimate the magnitude of the familiar-voice intelligibility benefit. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3487. [PMID: 31795686 DOI: 10.1121/1.5133628] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 10/23/2019] [Indexed: 06/10/2023]
Abstract
The ability to segregate simultaneous speech streams is crucial for successful communication. Recent studies have demonstrated that participants can report 10%-20% more words spoken by naturally familiar (e.g., friends or spouses) than unfamiliar talkers in two-voice mixtures. This benefit is commensurate with one of the largest benefits to speech intelligibility currently known-that which is gained by spatially separating two talkers. However, because of differences in the methods of these previous studies, the relative benefits of spatial separation and voice familiarity are unclear. Here, the familiar-voice benefit and spatial release from masking are directly compared, and it is examined if and how these two cues interact with one another. Talkers were recorded while speaking sentences from a published closed-set "matrix" task, and then listeners were presented with three different sentences played simultaneously. Each target sentence was played at 0° azimuth, and two masker sentences were symmetrically separated about the target. On average, participants reported 10%-30% more words correctly when the target sentence was spoken in a familiar than unfamiliar voice (collapsed over spatial separation conditions); it was found that participants gain a similar benefit from a familiar target as when an unfamiliar voice is separated from two symmetrical maskers by approximately 15° azimuth.
Collapse
Affiliation(s)
- Ysabel Domingo
- Brain and Mind Institute, University of Western Ontario, London, Ontario, Canada
| | - Emma Holmes
- Brain and Mind Institute, University of Western Ontario, London, Ontario, Canada
| | - Ewan Macpherson
- School of Communication Sciences and Disorders, University of Western Ontario, London, Ontario, Canada
| | - Ingrid S Johnsrude
- Brain and Mind Institute, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
35
|
Choi JY, Perrachione TK. Time and information in perceptual adaptation to speech. Cognition 2019; 192:103982. [PMID: 31229740 PMCID: PMC6732236 DOI: 10.1016/j.cognition.2019.05.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 05/11/2019] [Accepted: 05/25/2019] [Indexed: 11/18/2022]
Abstract
Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.
Collapse
Affiliation(s)
- Ja Young Choi
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States.
| |
Collapse
|
36
|
Symonds RM, Zhou JW, Cole SL, Brace KM, Sussman ES. Cognitive resources are distributed among the entire auditory landscape in auditory scene analysis. Psychophysiology 2019; 57:e13487. [PMID: 31578762 DOI: 10.1111/psyp.13487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 08/21/2019] [Accepted: 09/04/2019] [Indexed: 01/30/2023]
Abstract
Although attention has been shown to enhance neural representations of selected inputs, the fate of unselected background sounds is still debated. The goal of the current study was to understand how processing resources are distributed among attended and unattended sounds during auditory scene analysis. We used a three-stream paradigm with four acoustic features uniquely defining each sound stream (frequency, envelope shape, spatial location, tone quality). We manipulated task load by having participants perform a difficult auditory task and an easy movie-viewing task with the same set of sounds in separate conditions. The mismatch negativity (MMN) component of event-related brain potentials (ERPs) was measured to evaluate sound processing in both conditions. We found no effect of task demands on unattended sound processing: MMNs were elicited by unattended deviants during both low- and high-load task conditions. A key factor of this result was the use of unique tone feature combinations to distinguish each of the three sound streams, strengthening the segregation of streams. In the auditory task, the P3b component demonstrates a two-stage process of target evaluation. Thus, these results, in conjunction with results of previous studies, suggest that stimulus-driven factors that strengthen stream segregation can free up processing capacity for higher-level analyses. The results illustrate the interactive nature of top-down and stimulus-driven processes in stream formation, supporting a distributive theory of attention that balances the strength of the bottom-up input with perceptual goals in analyzing the auditory scene.
Collapse
Affiliation(s)
- Renee M Symonds
- Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Juin W Zhou
- Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, USA.,Department of Biomedical Engineering, Stony Brook University, Stony Brook, New York, USA
| | - Sally L Cole
- Department of Counseling and Clinical Psychology, Teachers College, Columbia University, New York, New York, USA
| | - Kelin M Brace
- Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Elyse S Sussman
- Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
37
|
Best V, Swaminathan J, Kopčo N, Roverud E, Shinn-Cunningham B. A "Buildup" of Speech Intelligibility in Listeners With Normal Hearing and Hearing Loss. Trends Hear 2019; 22:2331216518807519. [PMID: 30353783 PMCID: PMC6201174 DOI: 10.1177/2331216518807519] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The perception of simple auditory mixtures is known to evolve over time. For
instance, a common example of this is the “buildup” of stream segregation that
is observed for sequences of tones alternating in pitch. Yet very little is
known about how the perception of more complicated auditory scenes, such as
multitalker mixtures, changes over time. Previous data are consistent with the
idea that the ability to segregate a target talker from competing sounds
improves rapidly when stable cues are available, which leads to improvements in
speech intelligibility. This study examined the time course of this buildup in
listeners with normal and impaired hearing. Five simultaneous sequences of
digits, varying in length from three to six digits, were presented from five
locations in the horizontal plane. A synchronized visual cue at one location
indicated which sequence was the target on each trial. We observed a buildup in
digit identification performance, driven primarily by reductions in confusions
between the target and the maskers, that occurred over the course of three to
four digits. Performance tended to be poorer in listeners with hearing loss;
however, there was only weak evidence that the buildup was diminished or slowed
in this group.
Collapse
Affiliation(s)
- Virginia Best
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| | | | - Norbert Kopčo
- 3 Faculty of Science, Institute of Computer Science, P. J. Safarik University, Kosice, Slovakia
| | - Elin Roverud
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| | | |
Collapse
|
38
|
Auditory Figure-Ground Segregation Is Impaired by High Visual Load. J Neurosci 2018; 39:1699-1708. [PMID: 30541915 PMCID: PMC6391559 DOI: 10.1523/jneurosci.2518-18.2018] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/19/2018] [Accepted: 11/19/2018] [Indexed: 11/21/2022] Open
Abstract
Figure-ground segregation is fundamental to listening in complex acoustic environments. An ongoing debate pertains to whether segregation requires attention or is "automatic" and preattentive. In this magnetoencephalography study, we tested a prediction derived from load theory of attention (e.g., Lavie, 1995) that segregation requires attention but can benefit from the automatic allocation of any "leftover" capacity under low load. Complex auditory scenes were modeled with stochastic figure-ground stimuli (Teki et al., 2013), which occasionally contained repeated frequency component "figures." Naive human participants (both sexes) passively listened to these signals while performing a visual attention task of either low or high load. While clear figure-related neural responses were observed under conditions of low load, high visual load substantially reduced the neural response to the figure in auditory cortex (planum temporale, Heschl's gyrus). We conclude that fundamental figure-ground segregation in hearing is not automatic but draws on resources that are shared across vision and audition.SIGNIFICANCE STATEMENT This work resolves a long-standing question of whether figure-ground segregation, a fundamental process of auditory scene analysis, requires attention or is underpinned by automatic, encapsulated computations. Task-irrelevant sounds were presented during performance of a visual search task. We revealed a clear magnetoencephalography neural signature of figure-ground segregation in conditions of low visual load, which was substantially reduced in conditions of high visual load. This demonstrates that, although attention does not need to be actively allocated to sound for auditory segregation to occur, segregation depends on shared computational resources across vision and hearing. The findings further highlight that visual load can impair the computational capacity of the auditory system, even when it does not simply dampen auditory responses as a whole.
Collapse
|
39
|
Cortical tracking of multiple streams outside the focus of attention in naturalistic auditory scenes. Neuroimage 2018; 181:617-626. [DOI: 10.1016/j.neuroimage.2018.07.052] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 07/19/2018] [Accepted: 07/22/2018] [Indexed: 11/30/2022] Open
|
40
|
Thomassen S, Bendixen A. Assessing the background decomposition of a complex auditory scene with event-related brain potentials. Hear Res 2018; 370:120-129. [PMID: 30368055 DOI: 10.1016/j.heares.2018.09.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 09/17/2018] [Accepted: 09/30/2018] [Indexed: 11/26/2022]
Abstract
A listener who focusses on a sound source of interest must continuously integrate the sounds emitted by the attended source and ignore the sounds emitted by the remaining sources in the auditory scene. Little is known about how the ignored sound sources in the background are mentally represented after the source of interest has formed the perceptual foreground. This is due to a key methodological challenge: the background representation is by definition not overtly reportable. Here we developed a paradigm based on event-related brain potentials (ERPs) to assess the mental representation of background sounds. Participants listened to sequences of three repeatedly presented tones arranged in an ascending order (low, middle, high frequency). They were instructed to detect intensity deviants in one of the tones, creating the perceptual foreground. The remaining two background tones contained timing and location deviants. Those deviants were set up such that mismatch negativity (MMN) components would be elicited in distinct ways if the background was decomposed into two separate sound streams (background segregation) or if it was not further decomposed (background integration). Results provide MMN-based evidence for background segregation and integration in parallel. This suggests that mental representations of background integration and segregation can be concurrently available, and that collecting empirical evidence for only one of these background organization alternatives might lead to erroneous conclusions.
Collapse
Affiliation(s)
- Sabine Thomassen
- Institute of Physics, School of Natural Sciences, Chemnitz University of Technology, Reichenhainer Str. 70, D-09126, Chemnitz, Germany; Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstr. 114-118, D-26129, Oldenburg, Germany.
| | - Alexandra Bendixen
- Institute of Physics, School of Natural Sciences, Chemnitz University of Technology, Reichenhainer Str. 70, D-09126, Chemnitz, Germany; Institute of Psychology, University of Leipzig, Neumarkt 9-19, D-04109, Leipzig, Germany.
| |
Collapse
|
41
|
Kreitewolf J, Mathias SR, Trapeau R, Obleser J, Schönwiesner M. Perceptual grouping in the cocktail party: Contributions of voice-feature continuity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2178. [PMID: 30404485 DOI: 10.1121/1.5058684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 09/18/2018] [Indexed: 06/08/2023]
Abstract
Cocktail parties pose a difficult yet solvable problem for the auditory system. Previous work has shown that the cocktail-party problem is considerably easier when all sounds in the target stream are spoken by the same talker (the voice-continuity benefit). The present study investigated the contributions of two of the most salient voice features-glottal-pulse rate (GPR) and vocal-tract length (VTL)-to the voice-continuity benefit. Twenty young, normal-hearing listeners participated in two experiments. On each trial, listeners heard concurrent sequences of spoken digits from three different spatial locations and reported the digits coming from a target location. Critically, across conditions, GPR and VTL either remained constant or varied across target digits. Additionally, across experiments, the target location either remained constant (Experiment 1) or varied (Experiment 2) within a trial. In Experiment 1, listeners benefited from continuity in either voice feature, but VTL continuity was more helpful than GPR continuity. In Experiment 2, spatial discontinuity greatly hindered listeners' abilities to exploit continuity in GPR and VTL. The present results suggest that selective attention benefits from continuity in target voice features and that VTL and GPR play different roles for perceptual grouping and stream segregation in the cocktail party.
Collapse
Affiliation(s)
- Jens Kreitewolf
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Samuel R Mathias
- Neurocognition, Neurocomputation and Neurogenetics (n3) Division, Yale University School of Medicine, 40 Temple Street, New Haven, Connecticut 06511, USA
| | - Régis Trapeau
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Straße 9a, D-23562 Lübeck, Germany
| | - Marc Schönwiesner
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| |
Collapse
|
42
|
Holt LL, Tierney AT, Guerra G, Laffere A, Dick F. Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing. Hear Res 2018; 366:50-64. [PMID: 30131109 PMCID: PMC6107307 DOI: 10.1016/j.heares.2018.06.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/10/2018] [Accepted: 06/19/2018] [Indexed: 12/24/2022]
Abstract
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, 15213, USA; Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Giada Guerra
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK; Department of Experimental Psychology, University College London, London, WC1H 0AP, UK
| |
Collapse
|
43
|
Cai H, Screven LA, Dent ML. Behavioral measurements of auditory streaming and build-up by budgerigars ( Melopsittacus undulatus). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1508. [PMID: 30424658 DOI: 10.1121/1.5054297] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 08/27/2018] [Indexed: 06/09/2023]
Abstract
The perception of the build-up of auditory streaming has been widely investigated in humans, while it is unknown whether animals experience a similar perception when hearing high (H) and low (L) tonal pattern sequences. The paradigm previously used in European starlings (Sturnus vulgaris) was adopted in two experiments to address the build-up of auditory streaming in budgerigars (Melopsittacus undulatus). In experiment 1, different numbers of repetitions of low-high-low triplets were used in five conditions to study the build-up process. In experiment 2, 5 and 15 repetitions of high-low-high triplets were used to investigate the effects of repetition rate, frequency separation, and frequency range of the two tones on the birds' streaming perception. Similar to humans, budgerigars subjectively experienced the build-up process in auditory streaming; faster repetition rates and larger frequency separations enhanced the streaming perception, and these results were consistent across the two frequency ranges. Response latency analysis indicated that the budgerigars needed a longer amount of time to respond to stimuli that elicited a salient streaming perception. These results indicate, for the first time using a behavioral paradigm, that budgerigars experience a build-up of auditory streaming in a manner similar to humans.
Collapse
Affiliation(s)
- Huaizhen Cai
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York 14260, USA
| | - Laurel A Screven
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York 14260, USA
| | - Micheal L Dent
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York 14260, USA
| |
Collapse
|
44
|
Kamourieh S, Braga RM, Leech R, Mehta A, Wise RJS. Speech Registration in Symptomatic Memory Impairment. Front Aging Neurosci 2018; 10:201. [PMID: 30038566 PMCID: PMC6046456 DOI: 10.3389/fnagi.2018.00201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 06/13/2018] [Indexed: 11/20/2022] Open
Abstract
Background: An inability to recall recent conversations often indicates impaired episodic memory retrieval. It may also reflect a failure of attentive registration of spoken sentences which leads to unsuccessful memory encoding. The hypothesis was that patients complaining of impaired memory would demonstrate impaired function of “multiple demand” (MD) brain regions, whose activation profile generalizes across cognitive domains, during speech registration in naturalistic listening conditions. Methods: Using functional MRI, brain activity was measured in 22 normal participants and 31 patients complaining of memory impairment, 21 of whom had possible or probable Alzheimer’s disease (AD). Participants heard a target speaker, either speaking alone or in the presence of distracting background speech, followed by a question to determine if the target speech had been registered. Results: Patients performed poorly at registering verbal information, which correlated with their scores on a screening test of cognitive impairment. Speech registration was associated with widely distributed activity in both auditory cortex and in MD cortex. Additional regions were most active when the target speech had to be separated from background speech. Activity in midline and lateral frontal MD cortex was reduced in the patients. A central cholinesterase inhibitor to increase brain acetylcholine levels in half the patients was not observed to alter brain activity or improve task performance at a second fMRI scan performed 6–11 weeks later. However, individual performances spontaneously fluctuated between the two scanning sessions, and these performance differences correlated with activity within a right hemisphere fronto-temporal system previously associated with sustained auditory attention. Conclusions: Midline and lateralized frontal regions that are engaged in task-dependent attention to, and registration of, verbal information are potential targets for transcranial brain stimulation to improve speech registration in neurodegenerative conditions.
Collapse
Affiliation(s)
- Salwa Kamourieh
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom
| | - Rodrigo M Braga
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom.,Center for Brain Science, Harvard University, Cambridge, MA, United States
| | - Robert Leech
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom
| | - Amrish Mehta
- Department of Neuroradiology, Charing Cross Hospital, Imperial College Healthcare NHS Trust, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Richard J S Wise
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom
| |
Collapse
|
45
|
Syntactic processing in music and language: Effects of interrupting auditory streams with alternating timbres. Int J Psychophysiol 2018; 129:31-40. [DOI: 10.1016/j.ijpsycho.2018.05.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Revised: 05/05/2018] [Accepted: 05/07/2018] [Indexed: 02/08/2023]
|
46
|
Hausfeld L, Riecke L, Formisano E. Acoustic and higher-level representations of naturalistic auditory scenes in human auditory and frontal cortex. Neuroimage 2018. [DOI: 10.1016/j.neuroimage.2018.02.065] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
|
47
|
Choi I. Interactive Sonification Exploring Emergent Behavior Applying Models for Biological Information and Listening. Front Neurosci 2018; 12:197. [PMID: 29755311 PMCID: PMC5934483 DOI: 10.3389/fnins.2018.00197] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 03/12/2018] [Indexed: 11/29/2022] Open
Abstract
Sonification is an open-ended design task to construct sound informing a listener of data. Understanding application context is critical for shaping design requirements for data translation into sound. Sonification requires methodology to maintain reproducibility when data sources exhibit non-linear properties of self-organization and emergent behavior. This research formalizes interactive sonification in an extensible model to support reproducibility when data exhibits emergent behavior. In the absence of sonification theory, extensibility demonstrates relevant methods across case studies. The interactive sonification framework foregrounds three factors: reproducible system implementation for generating sonification; interactive mechanisms enhancing a listener's multisensory observations; and reproducible data from models that characterize emergent behavior. Supramodal attention research suggests interactive exploration with auditory feedback can generate context for recognizing irregular patterns and transient dynamics. The sonification framework provides circular causality as a signal pathway for modeling a listener interacting with emergent behavior. The extensible sonification model adopts a data acquisition pathway to formalize functional symmetry across three subsystems: Experimental Data Source, Sound Generation, and Guided Exploration. To differentiate time criticality and dimensionality of emerging dynamics, tuning functions are applied between subsystems to maintain scale and symmetry of concurrent processes and temporal dynamics. Tuning functions accommodate sonification design strategies that yield order parameter values to render emerging patterns discoverable as well as rehearsable, to reproduce desired instances for clinical listeners. Case studies are implemented with two computational models, Chua's circuit and Swarm Chemistry social agent simulation, generating data in real-time that exhibits emergent behavior. Heuristic Listening is introduced as an informal model of a listener's clinical attention to data sonification through multisensory interaction in a context of structured inquiry. Three methods are introduced to assess the proposed sonification framework: Listening Scenario classification, data flow Attunement, and Sonification Design Patterns to classify sound control. Case study implementations are assessed against these methods comparing levels of abstraction between experimental data and sound generation. Outcomes demonstrate the framework performance as a reference model for representing experimental implementations, also for identifying common sonification structures having different experimental implementations, identifying common functions implemented in different subsystems, and comparing impact of affordances across multiple implementations of listening scenarios.
Collapse
Affiliation(s)
- Insook Choi
- Studio for International Media & Technology, MediaCityUK, School of Arts & Media, University of Salford, Manchester, United Kingdom
| |
Collapse
|
48
|
Sensorineural hearing loss degrades behavioral and physiological measures of human spatial selective auditory attention. Proc Natl Acad Sci U S A 2018; 115:E3286-E3295. [PMID: 29555752 PMCID: PMC5889663 DOI: 10.1073/pnas.1721226115] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Listeners with sensorineural hearing loss often have trouble understanding speech amid other voices. While poor spatial hearing is often implicated, direct evidence is weak; moreover, studies suggest that reduced audibility and degraded spectrotemporal coding may explain such problems. We hypothesized that poor spatial acuity leads to difficulty deploying selective attention, which normally filters out distracting sounds. In listeners with normal hearing, selective attention causes changes in the neural responses evoked by competing sounds, which can be used to quantify the effectiveness of attentional control. Here, we used behavior and electroencephalography to explore whether control of selective auditory attention is degraded in hearing-impaired (HI) listeners. Normal-hearing (NH) and HI listeners identified a simple melody presented simultaneously with two competing melodies, each simulated from different lateral angles. We quantified performance and attentional modulation of cortical responses evoked by these competing streams. Compared with NH listeners, HI listeners had poorer sensitivity to spatial cues, performed more poorly on the selective attention task, and showed less robust attentional modulation of cortical responses. Moreover, across NH and HI individuals, these measures were correlated. While both groups showed cortical suppression of distracting streams, this modulation was weaker in HI listeners, especially when attending to a target at midline, surrounded by competing streams. These findings suggest that hearing loss interferes with the ability to filter out sound sources based on location, contributing to communication difficulties in social situations. These findings also have implications for technologies aiming to use neural signals to guide hearing aid processing.
Collapse
|
49
|
Disbergen NR, Valente G, Formisano E, Zatorre RJ. Assessing Top-Down and Bottom-Up Contributions to Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2018; 12:121. [PMID: 29563861 PMCID: PMC5845899 DOI: 10.3389/fnins.2018.00121] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 02/15/2018] [Indexed: 11/24/2022] Open
Abstract
Polyphonic music listening well exemplifies processes typically involved in daily auditory scene analysis situations, relying on an interactive interplay between bottom-up and top-down processes. Most studies investigating scene analysis have used elementary auditory scenes, however real-world scene analysis is far more complex. In particular, music, contrary to most other natural auditory scenes, can be perceived by either integrating or, under attentive control, segregating sound streams, often carried by different instruments. One of the prominent bottom-up cues contributing to multi-instrument music perception is their timbre difference. In this work, we introduce and validate a novel paradigm designed to investigate, within naturalistic musical auditory scenes, attentive modulation as well as its interaction with bottom-up processes. Two psychophysical experiments are described, employing custom-composed two-voice polyphonic music pieces within a framework implementing a behavioral performance metric to validate listener instructions requiring either integration or segregation of scene elements. In Experiment 1, the listeners' locus of attention was switched between individual instruments or the aggregate (i.e., both instruments together), via a task requiring the detection of temporal modulations (i.e., triplets) incorporated within or across instruments. Subjects responded post-stimulus whether triplets were present in the to-be-attended instrument(s). Experiment 2 introduced the bottom-up manipulation by adding a three-level morphing of instrument timbre distance to the attentional framework. The task was designed to be used within neuroimaging paradigms; Experiment 2 was additionally validated behaviorally in the functional Magnetic Resonance Imaging (fMRI) environment. Experiment 1 subjects (N = 29, non-musicians) completed the task at high levels of accuracy, showing no group differences between any experimental conditions. Nineteen listeners also participated in Experiment 2, showing a main effect of instrument timbre distance, even though within attention-condition timbre-distance contrasts did not demonstrate any timbre effect. Correlation of overall scores with morph-distance effects, computed by subtracting the largest from the smallest timbre distance scores, showed an influence of general task difficulty on the timbre distance effect. Comparison of laboratory and fMRI data showed scanner noise had no adverse effect on task performance. These Experimental paradigms enable to study both bottom-up and top-down contributions to auditory stream segregation and integration within psychophysical and neuroimaging experiments.
Collapse
Affiliation(s)
- Niels R. Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Robert J. Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain Music and Sound Research (BRAMS), Montreal, QC, Canada
| |
Collapse
|
50
|
de Boer J, Krumbholz K. Auditory Attention Causes Gain Enhancement and Frequency Sharpening at Successive Stages of Cortical Processing-Evidence from Human Electroencephalography. J Cogn Neurosci 2018; 30:785-798. [PMID: 29488851 DOI: 10.1162/jocn_a_01245] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous findings have suggested that auditory attention causes not only enhancement in neural processing gain, but also sharpening in neural frequency tuning in human auditory cortex. The current study was aimed to reexamine these findings. Specifically, we aimed to investigate whether attentional gain enhancement and frequency sharpening emerge at the same or different processing levels and whether they represent independent or cooperative effects. For that, we examined the pattern of attentional modulation effects on early, sensory-driven cortical auditory-evoked potentials occurring at different latencies. Attention was manipulated using a dichotic listening task and was thus not selectively directed to specific frequency values. Possible attention-related changes in frequency tuning selectivity were measured with an adaptation paradigm. Our results show marked disparities in attention effects between the earlier N1 deflection and the subsequent P2 deflection, with the N1 showing a strong gain enhancement effect, but no sharpening, and the P2 showing clear evidence of sharpening, but no independent gain effect. They suggest that gain enhancement and frequency sharpening represent successive stages of a cooperative attentional modulation mechanism that increases the representational bandwidth of attended versus unattended sounds.
Collapse
|