1
|
Predictors of Susceptibility to Noise and Speech Masking Among School-Age Children With Hearing Loss or Typical Hearing. Ear Hear 2024; 45:81-93. [PMID: 37415268 PMCID: PMC10771540 DOI: 10.1097/aud.0000000000001403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
OBJECTIVES The purpose of this study was to evaluate effects of masker type and hearing group on the relationship between school-age children's speech recognition and age, vocabulary, working memory, and selective attention. This study also explored effects of masker type and hearing group on the time course of maturation of masked speech recognition. DESIGN Participants included 31 children with normal hearing (CNH) and 41 children with mild to severe bilateral sensorineural hearing loss (CHL), between 6.7 and 13 years of age. Children with hearing aids used their personal hearing aids throughout testing. Audiometric thresholds and standardized measures of vocabulary, working memory, and selective attention were obtained from each child, along with masked sentence recognition thresholds in a steady state, speech-spectrum noise (SSN) and in a two-talker speech masker (TTS). Aided audibility through children's hearing aids was calculated based on the Speech Intelligibility Index (SII) for all children wearing hearing aids. Linear mixed effects models were used to examine the contribution of group, age, vocabulary, working memory, and attention to individual differences in speech recognition thresholds in each masker. Additional models were constructed to examine the role of aided audibility on masked speech recognition in CHL. Finally, to explore the time course of maturation of masked speech perception, linear mixed effects models were used to examine interactions between age, masker type, and hearing group as predictors of masked speech recognition. RESULTS Children's thresholds were higher in TTS than in SSN. There was no interaction of hearing group and masker type. CHL had higher thresholds than CNH in both maskers. In both hearing groups and masker types, children with better vocabularies had lower thresholds. An interaction of hearing group and attention was observed only in the TTS. Among CNH, attention predicted thresholds in TTS. Among CHL, vocabulary and aided audibility predicted thresholds in TTS. In both maskers, thresholds decreased as a function of age at a similar rate in CNH and CHL. CONCLUSIONS The factors contributing to individual differences in speech recognition differed as a function of masker type. In TTS, the factors contributing to individual difference in speech recognition further differed as a function of hearing group. Whereas attention predicted variance for CNH in TTS, vocabulary and aided audibility predicted variance in CHL. CHL required a more favorable signal to noise ratio (SNR) to recognize speech in TTS than in SSN (mean = +1 dB in TTS, -3 dB in SSN). We posit that failures in auditory stream segregation limit the extent to which CHL can recognize speech in a speech masker. Larger sample sizes or longitudinal data are needed to characterize the time course of maturation of masked speech perception in CHL.
Collapse
|
2
|
Neural correlation of speech envelope tracking for background noise in normal hearing. Front Neurosci 2023; 17:1268591. [PMID: 37916182 PMCID: PMC10616241 DOI: 10.3389/fnins.2023.1268591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open
Abstract
Everyday speech communication often occurs in environments with background noise, and the impact of noise on speech recognition can vary depending on factors such as noise type, noise intensity, and the listener's hearing ability. However, the extent to which neural mechanisms in speech understanding are influenced by different types and levels of noise remains unknown. This study aims to investigate whether individuals exhibit distinct neural responses and attention strategies depending on noise conditions. We recorded electroencephalography (EEG) data from 20 participants with normal hearing (13 males) and evaluated both neural tracking of speech envelopes and behavioral performance in speech understanding in the presence of varying types of background noise. Participants engaged in an EEG experiment consisting of two separate sessions. The first session involved listening to a 12-min story presented binaurally without any background noise. In the second session, speech understanding scores were measured using matrix sentences presented under speech-shaped noise (SSN) and Story noise background noise conditions at noise levels corresponding to sentence recognitions score (SRS). We observed differences in neural envelope correlation depending on noise type but not on its level. Interestingly, the impact of noise type on the variation in envelope tracking was more significant among participants with higher speech perception scores, while those with lower scores exhibited similarities in envelope correlation regardless of the noise condition. The findings suggest that even individuals with normal hearing could adopt different strategies to understand speech in challenging listening environments, depending on the type of noise.
Collapse
|
3
|
Differential Effects of Binaural Pitch Fusion Range on the Benefits of Voice Gender Differences in a "Cocktail Party" Environment for Bimodal and Bilateral Cochlear Implant Users. Ear Hear 2023; 44:318-329. [PMID: 36395512 PMCID: PMC9957805 DOI: 10.1097/aud.0000000000001283] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVES Some cochlear implant (CI) users are fitted with a CI in each ear ("bilateral"), while others have a CI in one ear and a hearing aid in the other ("bimodal"). Presently, evaluation of the benefits of bilateral or bimodal CI fitting does not take into account the integration of frequency information across the ears. This study tests the hypothesis that CI listeners, especially bimodal CI users, with a more precise integration of frequency information across ears ("sharp binaural pitch fusion") will derive greater benefit from voice gender differences in a multi-talker listening environment. DESIGN Twelve bimodal CI users and twelve bilateral CI users participated. First, binaural pitch fusion ranges were measured using the simultaneous, dichotic presentation of reference and comparison stimuli (electric pulse trains for CI ears and acoustic tones for HA ears) in opposite ears, with reference stimuli fixed and comparison stimuli varied in frequency/electrode to find the range perceived as a single sound. Direct electrical stimulation was used in implanted ears through the research interface, which allowed selective stimulation of one electrode at a time, and acoustic stimulation was used in the non-implanted ears through the headphone. Second, speech-on-speech masking performance was measured to estimate masking release by voice gender difference between target and maskers (VGRM). The VGRM was calculated as the difference in speech recognition thresholds of target sounds in the presence of same-gender or different-gender maskers. RESULTS Voice gender differences between target and masker talkers improved speech recognition performance for the bimodal CI group, but not the bilateral CI group. The bimodal CI users who benefited the most from voice gender differences were those who had the narrowest range of acoustic frequencies that fused into a single sound with stimulation from a single electrode from the CI in the opposite ear. There was no similar voice gender difference benefit of narrow binaural fusion range for the bilateral CI users. CONCLUSIONS The findings suggest that broad binaural fusion reduces the acoustical information available for differentiating individual talkers in bimodal CI users, but not for bilateral CI users. In addition, for bimodal CI users with narrow binaural fusion who benefit from voice gender differences, bilateral implantation could lead to a loss of that benefit and impair their ability to selectively attend to one talker in the presence of multiple competing talkers. The results suggest that binaural pitch fusion, along with an assessment of residual hearing and other factors, could be important for assessing bimodal and bilateral CI users.
Collapse
|
4
|
The P300 Auditory Event-Related Potential May Predict Segregation of Competing Speech by Bimodal Cochlear Implant Listeners. Front Neurosci 2022; 16:888596. [PMID: 35757527 PMCID: PMC9226716 DOI: 10.3389/fnins.2022.888596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 05/16/2022] [Indexed: 11/13/2022] Open
Abstract
Compared to normal-hearing (NH) listeners, cochlear implant (CI) listeners have greater difficulty segregating competing speech. Neurophysiological studies have largely investigated the neural foundations for CI listeners' speech recognition in quiet, mainly using the P300 component of event-related potentials (ERPs). P300 is closely related to cognitive processes involving auditory discrimination, selective attention, and working memory. Different from speech perception in quiet, little is known about the neurophysiological foundations for segregation of competing speech by CI listeners. In this study, ERPs were measured for a 1 vs. 2 kHz contrast in 11 Mandarin-speaking bimodal CI listeners and 11 NH listeners. Speech reception thresholds (SRTs) for a male target talker were measured in steady noise or with a male or female masker. Results showed that P300 amplitudes were significantly larger and latencies were significantly shorter for the NH than for the CI group. Similarly, SRTs were significantly better for the NH than for the CI group. Across all participants, P300 amplitude was significantly correlated with SRTs in steady noise (r = -0.65, p = 0.001) and with the competing male (r = -0.62, p = 0.002) and female maskers (r = -0.60, p = 0.003). Within the CI group, there was a significant correlation between P300 amplitude and SRTs with the male masker (r = -0.78, p = 0.005), which produced the most informational masking. The results suggest that P300 amplitude may be a clinically useful neural correlate of central auditory processing capabilities (e.g., susceptibility to informational masking) in bimodal CI patients.
Collapse
|
5
|
Abstract
Identification of speech from a "target" talker was measured in a speech-on-speech
masking task with two simultaneous "masker" talkers. The overall level of each talker was
either fixed or randomized throughout each stimulus presentation to investigate the
effectiveness of level as a cue for segregating competing talkers and attending to the
target. Experimental manipulations included varying the level difference between talkers
and imposing three types of target level uncertainty: 1) fixed target level across trials,
2) random target level across trials, or 3) random target levels on a word-by-word basis
within a trial. When the target level was predictable performance was better than
corresponding conditions when the target level was uncertain. Masker confusions were
consistent with a high degree of informational masking (IM). Furthermore, evidence was
found for "tuning" in level and a level "release" from IM. These findings suggest that
conforming to listener expectation about relative level, in addition to cues signaling
talker identity, facilitates segregation of, and maintaining focus of attention on, a
specific talker in multiple-talker communication situations.
Collapse
|
6
|
Hemodynamic Responses Link Individual Differences in Informational Masking to the Vicinity of Superior Temporal Gyrus. Front Neurosci 2021; 15:675326. [PMID: 34366772 PMCID: PMC8339305 DOI: 10.3389/fnins.2021.675326] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 05/13/2021] [Indexed: 01/20/2023] Open
Abstract
Suppressing unwanted background sound is crucial for aural communication. A particularly disruptive type of background sound, informational masking (IM), often interferes in social settings. However, IM mechanisms are incompletely understood. At present, IM is identified operationally: when a target should be audible, based on suprathreshold target/masker energy ratios, yet cannot be heard because target-like background sound interferes. We here confirm that speech identification thresholds differ dramatically between low- vs. high-IM background sound. However, speech detection thresholds are comparable across the two conditions. Moreover, functional near infrared spectroscopy recordings show that task-evoked blood oxygenation changes near the superior temporal gyrus (STG) covary with behavioral speech detection performance for high-IM but not low-IM background sound, suggesting that the STG is part of an IM-dependent network. Moreover, listeners who are more vulnerable to IM show increased hemodynamic recruitment near STG, an effect that cannot be explained based on differences in task difficulty across low- vs. high-IM. In contrast, task-evoked responses near another auditory region of cortex, the caudal inferior frontal sulcus (cIFS), do not predict behavioral sensitivity, suggesting that the cIFS belongs to an IM-independent network. Results are consistent with the idea that cortical gating shapes individual vulnerability to IM.
Collapse
|
7
|
EEG power spectral dynamics associated with listening in adverse conditions. Psychophysiology 2021; 58:e13877. [PMID: 34161612 DOI: 10.1111/psyp.13877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 05/15/2021] [Accepted: 05/17/2021] [Indexed: 01/08/2023]
Abstract
Adverse listening conditions increase the demand on cognitive resources needed for speech comprehension. In an exploratory study, we aimed to identify independent power spectral features in the EEG useful for studying the cognitive processes involved in this effortful listening. Listeners performed the coordinate response measure task with a single-talker masker at a 0-dB signal-to-noise ratio. Sounds were left unfiltered or degraded with low-pass filtering. Independent component analysis (ICA) was used to identify independent components (ICs) in the EEG data, the power spectral dynamics of which were then analyzed. Frontal midline theta, left frontal, right frontal, left mu, right mu, left temporal, parietal, left occipital, central occipital, and right occipital clusters of ICs were identified. All IC clusters showed some significant listening-related changes in their power spectrum. This included sustained theta enhancements, gamma enhancements, alpha enhancements, alpha suppression, beta enhancements, and mu rhythm suppression. Several of these effects were absent or negligible using traditional channel analyses. Comparison of filtered to unfiltered speech revealed a stronger alpha suppression in the parietal and central occipital clusters of ICs for the filtered speech condition. This not only replicates recent findings showing greater alpha suppression as listening difficulty increases but also suggests that such alpha-band effects can stem from multiple cortical sources. We lay out the advantages of the ICA approach over the restrictive analyses that have been used as of late in the study of listening effort. We also make suggestions for moving into hypothesis-driven studies regarding the power spectral features that were revealed.
Collapse
|
8
|
Selective attention modulates neural envelope tracking of informationally masked speech in healthy older adults. Hum Brain Mapp 2021; 42:3042-3057. [PMID: 33783913 PMCID: PMC8193518 DOI: 10.1002/hbm.25415] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 11/07/2022] Open
Abstract
Speech understanding in noisy situations is compromised in old age. This study investigated the energetic and informational masking components of multi‐talker babble noise and their influence on neural tracking of the speech envelope in a sample of healthy older adults. Twenty‐three older adults (age range 65–80 years) listened to an audiobook embedded in noise while their electroencephalogram (EEG) was recorded. Energetic masking was manipulated by varying the signal‐to‐noise ratio (SNR) between target speech and background talkers and informational masking was manipulated by varying the number of background talkers. Neural envelope tracking was measured by calculating temporal response functions (TRFs) between speech envelope and EEG. Number of background talkers, but not SNR modulated the amplitude of an earlier (around 50 ms time lag) and a later (around 300 ms time lag) peak in the TRFs. Selective attention, but not working memory or peripheral hearing additionally modulated the amplitude of the later TRF peak. Finally, amplitude of the later TRF peak was positively related to accuracy in the comprehension task. The results suggest that stronger envelope tracking is beneficial for speech‐in‐noise understanding and that selective attention is an important ability supporting speech‐in‐noise understanding in multi‐talker scenes.
Collapse
|
9
|
A New Speech-in-Noise Test for Measuring Informational Masking in Speech Perception Among Elderly Listeners. Cureus 2020; 12:e7356. [PMID: 32328368 PMCID: PMC7174866 DOI: 10.7759/cureus.7356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Introduction Elderly listeners have reported concerns about speech perception in noisy environments. This partly occurs because of their increased informational masking (IM). This study aimed to develop a Persian coordinate response measure (CRM) corpus and a novel speech-in-noise test for measuring IM. Material and methods A cross-sectional validation study was conducted in two parts. Part one was the determination of the validity and reliability of the Persian CRM corpus. Part two consisted of measuring the IM at five signal-to-noise ratio (SNR; -6,-3, 0, +3, and +6) in two conditions: one with the target and masker speaker of the same sex and one with the target and masker speaker of different sexes. In each condition, the IM measurements were performed at a 45° separation angle of target and maskers and as a co-location of the speakers. A group of young listeners aged 20 to 40 years and a group of elderly listeners aged 60 to 75 years were recruited (50 study participants in part one and 47 in part two). The study was conducted from July 2018 to March 2019 at the Iran University Medical Sciences audiology clinic. Content validity ratio, content validity index, impact score, Spearman's test, and Mann-Whitney's test were used for statistical analysis. Results The Persian CRM corpus showed acceptable validity and reliability in each group (p < 0.001). The results suggested that in both azimuth locations and at SNRs of 0, -3, and -6, the IM amount in the elderly group was significantly higher (p < 0.003) than in the young group at conditions of target and masker speakers of opposite-sex. However, in cases where both target and masker speakers were of the same sex, a significant difference was observed at an SNR of 0 in angular separation and SNRs of +3 and 0 at co-located situations (p < 0.001). Conclusion A validated Persian CRM corpus has been collected for use in IM measurement studies. Overall, the IM of elderly listeners was higher than younger listeners in low-cue situations such as lower SNR. Therefore, a novel speech-in-noise test for measuring IM was validated to use in speech perception studies in the elderly population.
Collapse
|
10
|
Energetic and Informational Components of Speech-on-Speech Masking in Binaural Speech Intelligibility and Perceived Listening Effort. Trends Hear 2019; 23:2331216519854597. [PMID: 31172880 PMCID: PMC6557024 DOI: 10.1177/2331216519854597] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Speech perception in complex sound fields can greatly benefit from different unmasking cues to segregate the target from interfering voices. This study investigated the role of three unmasking cues (spatial separation, gender differences, and masker time reversal) on speech intelligibility and perceived listening effort in normal-hearing listeners. Speech intelligibility and categorically scaled listening effort were measured for a female target talker masked by two competing talkers with no unmasking cues or one to three unmasking cues. In addition to natural stimuli, all measurements were also conducted with glimpsed speech—which was created by removing the time–frequency tiles of the speech mixture in which the maskers dominated the mixture—to estimate the relative amounts of informational and energetic masking as well as the effort associated with source segregation. The results showed that all unmasking cues as well as glimpsing improved intelligibility and reduced listening effort and that providing more than one cue was beneficial in overcoming informational masking. The reduction in listening effort due to glimpsing corresponded to increases in signal-to-noise ratio of 8 to 18 dB, indicating that a significant amount of listening effort was devoted to segregating the target from the maskers. Furthermore, the benefit in listening effort for all unmasking cues extended well into the range of positive signal-to-noise ratios at which speech intelligibility was at ceiling, suggesting that listening effort is a useful tool for evaluating speech-on-speech masking conditions at typical conversational levels.
Collapse
|
11
|
The Effects of Musical Training on Speech Detection in the Presence of Informational and Energetic Masking. Trends Hear 2019; 21:2331216517739427. [PMID: 29161982 PMCID: PMC5703091 DOI: 10.1177/2331216517739427] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Recent research has suggested that musicians have an advantage in some speech-in-noise paradigms, but not all. Whether musicians outperform nonmusicians on a given speech-in-noise task may well depend on the type of noise involved. To date, few groups have specifically studied the role that informational masking plays in the observation of a musician advantage. The current study investigated the effect of musicianship on listeners’ ability to overcome informational versus energetic masking of speech. Monosyllabic words were presented in four conditions that created similar energetic masking but either high or low informational masking. Two of these conditions used noise-vocoded target and masking stimuli to determine whether the absence of natural fine structure and spectral variations influenced any musician advantage. Forty young normal-hearing listeners (20 musicians and 20 nonmusicians) completed the study. There was a significant overall effect of participant group collapsing across the four conditions; however, planned comparisons showed musicians’ thresholds were only significantly better in the high informational masking natural speech condition, where the musician advantage was approximately 3 dB. These results add to the mounting evidence that informational masking plays a role in the presence and amount of musician benefit.
Collapse
|
12
|
Effectiveness of Two-Talker Maskers That Differ in Talker Congruity and Perceptual Similarity to the Target Speech. Trends Hear 2019; 21:2331216517709385. [PMID: 29169315 PMCID: PMC5476326 DOI: 10.1177/2331216517709385] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Previous work has shown that masked-sentence recognition is particularly poor when the masker is composed of two competing talkers, a finding that is attributed to informational masking. Informational masking tends to be largest when the target and masker talkers are perceptually similar. Reductions in masking have been observed for a wide range of target and masker differences, including language: Performance is better when the target and masker talkers speak in different languages, compared with the same language. The present study evaluated normal-hearing adults’ sentence recognition in a two-talker masker as a function of the perceptual similarity between the target and each of the two masker streams. The target was English, and the maskers were composed of English, time-reversed English, or Dutch. These three masker types are known to vary in the informational masking they exert. The two talkers within the two-talker maskers were either congruent (e.g., both English) or incongruent (e.g., one English, one Dutch). As predicted, mean performance was worse for the congruent English masker than the congruent time-reversed English or congruent Dutch maskers. Incongruent two-talker maskers, with just one English masker stream, were only modestly less effective than the congruent English masker. This result indicates that two-talker masker effectiveness was determined predominantly by the one masker stream that was most perceptually similar to the target. Speech recognition in a single-talker masker differed only marginally between the English, Dutch, and time-reversed English masker types, suggesting that perceptual similarity may be more critical in a two-talker than a one-talker masker.
Collapse
|
13
|
Age-Related Differences in the Effects of Masker Cuing on Releasing Chinese Speech From Informational Masking. Front Psychol 2018; 9:1922. [PMID: 30356784 PMCID: PMC6189421 DOI: 10.3389/fpsyg.2018.01922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/18/2018] [Indexed: 11/22/2022] Open
Abstract
The aims of the present study were to examine whether familiarity with a masker improves word recognition in speech masking situations and whether there are age-related differences in the effects of masker cuing. Thirty-two older listeners (range = 59–74; mean age = 66.41 years) with high-frequency hearing loss and 32 younger normal-hearing listeners (range = 21–28; mean age = 23.73) participated in this study, all of whom spoke Chinese as their first language. Two experiments were conducted and 16 younger and 16 older listeners were used in each experiment. The masking speech with different content from target speech with syntactically correct but semantically meaningless was a continuous recording of meaningless Chinese sentences spoken by two talkers. The masker level was adjusted to produce signal-to-masker ratios of -12, -8, -4, and 0 dB for the younger participants and -8, -4, 0, and 4 dB for the older participants. Under masker-priming conditions, a priming sentence, spoken by the masker talkers, was presented in quiet three times before a target sentence was presented together with a masker sentence 4 s later. In Experiment 1, using same-sentence masker-priming (identical to the masker sentence), the masker-priming improved the identification of the target sentence for both age groups compared to when no priming was provided. However, the amount of masking release was less in the older adults than in the younger adults. In Experiment 2, two kinds of primes were considered: same-sentence masker-priming, and different-sentence masker-priming (different from the masker sentence in content for each keyword). The results of Experiment 2 showed that both kinds of primes improved the identification of the targets for both age groups. However, the release from speech masking in both priming conditions was less in the older adults than in the younger adults, and the release from speech masking in both age groups was greater with same-sentence masker-priming than with different-sentence masker-priming. These results suggest that both the voice and content cues of a masker could be used to release target speech from maskers in noisy listening conditions. Furthermore, there was an age-related decline in masker-priming-induced release from speech masking.
Collapse
|
14
|
The effect of language, spatial factors, masker type and memory span on speech-in-noise thresholds in sequential bilingual children. Scand J Psychol 2018; 59:567-577. [PMID: 30137681 DOI: 10.1111/sjop.12466] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 04/25/2018] [Indexed: 11/29/2022]
Abstract
This study considers whether bilingual children listening in a second language are among those on which higher processing and cognitive demands are placed when noise is present. Forty-four Swedish sequential bilingual 15 year-olds were given memory span and vocabulary assessments in their first and second language (Swedish and English). First and second language speech reception thresholds (SRTs) at 50% intelligibility for numbers and colors presented in noise were obtained using an adaptive procedure. The target sentences were presented in simulated, virtual classroom acoustics, masked by either 16-talker multi-talker babble noise (MTBN) or speech shaped noise (SSN), positioned either directly in front of the listener (collocated with the target speech), or spatially separated from the target speech by 90° to either side. Main effects in the Spatial and Noise factors indicated that intelligibility was 3.8 dB lower in collocated conditions and 2.9 dB lower in MTBN conditions. SRTs were unexpectedly higher by 0.9 dB in second language conditions. Memory span significantly predicted 17% of the variance in the second language SRTs, and 9% of the variance in first language SRTs, indicating the possibility that the SRT task places higher cognitive demands when listening to second language speech than when the target is in the listener's first language.
Collapse
|
15
|
Cortical Gray Matter Loss, Augmented Vulnerability to Speech-on-Speech Masking, and Delusion in People With Schizophrenia. Front Psychiatry 2018; 9:287. [PMID: 30022955 PMCID: PMC6040158 DOI: 10.3389/fpsyt.2018.00287] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 06/12/2018] [Indexed: 11/13/2022] Open
Abstract
People with schizophrenia exhibit impairments in target-speech recognition (TSR) against multiple-talker-induced informational speech masking. Up to date, the underlying neural mechanisms and its relationships with psychotic symptoms remain largely unknown. This study aimed to investigate whether the schizophrenia-associated TSR impairment contribute to certain psychotic symptoms by sharing underlying alternations in cortical gray-matter volume (GMV) with the psychotic symptoms. Participants with schizophrenia (N = 34) and their matched healthy controls (N = 29) were tested for TSR against a two-talker-speech masker. Psychotic symptoms of participants with schizophrenia were evaluated using the Positive and Negative Syndrome Scale. The regional GMV across various cortical regions was assessed using the voxel-based morphometry. The results of partial-correlation and mediation analyses showed that in participants with schizophrenia, the TSR was negatively correlated with the delusion severity, but positively with the GMV in the bilateral superior/middle temporal cortex, bilateral insular, left medial orbital frontal gyrus, left Rolandic operculum, left mid-cingulate cortex, left posterior fusiform, and left cerebellum. Moreover, the association between GMV and delusion was based on the mediating role played by the TSR performance. Thus, in people with schizophrenia, both delusions and the augmented vulnerability of TSR to informational masking are associated with each other and share the underlying cortical GMV reduction, suggesting that the origin of delusion in schizophrenia may be related to disorganized or limited informational processing (e.g., the incapability of adequately filtering information from multiple sources at the perceptual level). The TSR impairment can be a potential marker for predicting delusion severity.
Collapse
|
16
|
Spatial Release From Informational Masking: Evidence From Functional Near Infrared Spectroscopy. Trends Hear 2018; 22:2331216518817464. [PMID: 30558491 PMCID: PMC6299332 DOI: 10.1177/2331216518817464] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 10/31/2018] [Accepted: 11/13/2018] [Indexed: 11/30/2022] Open
Abstract
Informational masking (IM) can greatly reduce speech intelligibility, but the neural mechanisms underlying IM are not understood. Binaural differences between target and masker can improve speech perception. In general, improvement in masked speech intelligibility due to provision of spatial cues is called spatial release from masking. Here, we focused on an aspect of spatial release from masking, specifically, the role of spatial attention. We hypothesized that in a situation with IM background sound (a) attention to speech recruits lateral frontal cortex (LFCx) and (b) LFCx activity varies with direction of spatial attention. Using functional near infrared spectroscopy, we assessed LFCx activity bilaterally in normal-hearing listeners. In Experiment 1, two talkers were simultaneously presented. Listeners either attended to the target talker (speech task) or they listened passively to an unintelligible, scrambled version of the acoustic mixture (control task). Target and masker differed in pitch and interaural time difference (ITD). Relative to the passive control, LFCx activity increased during attentive listening. Experiment 2 measured how LFCx activity varied with ITD, by testing listeners on the speech task in Experiment 1, except that talkers either were spatially separated by ITD or colocated. Results show that directing of auditory attention activates LFCx bilaterally. Moreover, right LFCx is recruited more strongly in the spatially separated as compared with colocated configurations. Findings hint that LFCx function contributes to spatial release from masking in situations with IM.
Collapse
|
17
|
Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
|
18
|
Abstract
BACKGROUND Under 'cocktail party' listening conditions, healthy listeners and listeners with schizophrenia can use temporally pre-presented auditory speech-priming (ASP) stimuli to improve target-speech recognition, even though listeners with schizophrenia are more vulnerable to informational speech masking. METHOD Using functional magnetic resonance imaging, this study searched for both brain substrates underlying the unmasking effect of ASP in 16 healthy controls and 22 patients with schizophrenia, and brain substrates underlying schizophrenia-related speech-recognition deficits under speech-masking conditions. RESULTS In both controls and patients, introducing the ASP condition (against the auditory non-speech-priming condition) not only activated the left superior temporal gyrus (STG) and left posterior middle temporal gyrus (pMTG), but also enhanced functional connectivity of the left STG/pMTG with the left caudate. It also enhanced functional connectivity of the left STG/pMTG with the left pars triangularis of the inferior frontal gyrus (TriIFG) in controls and that with the left Rolandic operculum in patients. The strength of functional connectivity between the left STG and left TriIFG was correlated with target-speech recognition under the speech-masking condition in both controls and patients, but reduced in patients. CONCLUSIONS The left STG/pMTG and their ASP-related functional connectivity with both the left caudate and some frontal regions (the left TriIFG in healthy listeners and the left Rolandic operculum in listeners with schizophrenia) are involved in the unmasking effect of ASP, possibly through facilitating the following processes: masker-signal inhibition, target-speech encoding, and speech production. The schizophrenia-related reduction of functional connectivity between the left STG and left TriIFG augments the vulnerability of speech recognition to speech masking.
Collapse
|
19
|
Activation and Functional Connectivity of the Left Inferior Temporal Gyrus during Visual Speech Priming in Healthy Listeners and Listeners with Schizophrenia. Front Neurosci 2017; 11:107. [PMID: 28360829 PMCID: PMC5350153 DOI: 10.3389/fnins.2017.00107] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 02/20/2017] [Indexed: 11/13/2022] Open
Abstract
Under a "cocktail-party" listening condition with multiple-people talking, compared to healthy people, people with schizophrenia benefit less from the use of visual-speech (lipreading) priming (VSP) cues to improve speech recognition. The neural mechanisms underlying the unmasking effect of VSP remain unknown. This study investigated the brain substrates underlying the unmasking effect of VSP in healthy listeners and the schizophrenia-induced changes in the brain substrates. Using functional magnetic resonance imaging, brain activation and functional connectivity for the contrasts of the VSP listening condition vs. the visual non-speech priming (VNSP) condition were examined in 16 healthy listeners (27.4 ± 8.6 years old, 9 females and 7 males) and 22 listeners with schizophrenia (29.0 ± 8.1 years old, 8 females and 14 males). The results showed that in healthy listeners, but not listeners with schizophrenia, the VSP-induced activation (against the VNSP condition) of the left posterior inferior temporal gyrus (pITG) was significantly correlated with the VSP-induced improvement in target-speech recognition against speech masking. Compared to healthy listeners, listeners with schizophrenia showed significantly lower VSP-induced activation of the left pITG and reduced functional connectivity of the left pITG with the bilateral Rolandic operculum, bilateral STG, and left insular. Thus, the left pITG and its functional connectivity may be the brain substrates related to the unmasking effect of VSP, assumedly through enhancing both the processing of target visual-speech signals and the inhibition of masking-speech signals. In people with schizophrenia, the reduced unmasking effect of VSP on speech recognition may be associated with a schizophrenia-related reduction of VSP-induced activation and functional connectivity of the left pITG.
Collapse
|
20
|
Selective Attention Enhances Beta-Band Cortical Oscillation to Speech under "Cocktail-Party" Listening Conditions. Front Hum Neurosci 2017; 11:34. [PMID: 28239344 PMCID: PMC5300994 DOI: 10.3389/fnhum.2017.00034] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 01/16/2017] [Indexed: 11/16/2022] Open
Abstract
Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated “cocktail-party” listening condition with speech-on-speech masking. The result shows that the cortical representation of target-speech signals under the multiple-people talking condition was specifically improved by selective attention relative to the non-selective-attention listening condition, and the beta-band activity was most strongly modulated by selective attention. Moreover, measured with the Granger Causality value, selective attention to the single target speech in the mixed-speech complex enhanced the following four causal connectivities for the beta-band oscillation: the ones (1) from site FT7 to the right motor area, (2) from the left frontal area to the right motor area, (3) from the central frontal area to the right motor area, and (4) from the central frontal area to the right frontal area. However, the selective-attention-induced change in beta-band causal connectivity from the central frontal area to the right motor area, but not other beta-band causal connectivities, was significantly correlated with the selective-attention-induced change in the cortical beta-band representation of target speech. These findings suggest that under the “cocktail-party” listening condition, the beta-band oscillation in EEGs to target speech is specifically facilitated by selective attention to the target speech that is embedded in the mixed-speech complex. The selective attention-induced unmasking of target speech may be associated with the improved beta-band functional connectivity from the central frontal area to the right motor area, suggesting a top-down attentional modulation of the speech-motor process.
Collapse
|
21
|
Neural Correlates of Auditory Perceptual Awareness and Release from Informational Masking Recorded Directly from Human Cortex: A Case Study. Front Neurosci 2016; 10:472. [PMID: 27812318 PMCID: PMC5071374 DOI: 10.3389/fnins.2016.00472] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2016] [Accepted: 10/03/2016] [Indexed: 11/13/2022] Open
Abstract
In complex acoustic environments, even salient supra-threshold sounds sometimes go unperceived, a phenomenon known as informational masking. The neural basis of informational masking (and its release) has not been well-characterized, particularly outside auditory cortex. We combined electrocorticography in a neurosurgical patient undergoing invasive epilepsy monitoring with trial-by-trial perceptual reports of isochronous target-tone streams embedded in random multi-tone maskers. Awareness of such masker-embedded target streams was associated with a focal negativity between 100 and 200 ms and high-gamma activity (HGA) between 50 and 250 ms (both in auditory cortex on the posterolateral superior temporal gyrus) as well as a broad P3b-like potential (between ~300 and 600 ms) with generators in ventrolateral frontal and lateral temporal cortex. Unperceived target tones elicited drastically reduced versions of such responses, if at all. While it remains unclear whether these responses reflect conscious perception, itself, as opposed to pre- or post-perceptual processing, the results suggest that conscious perception of target sounds in complex listening environments may engage diverse neural mechanisms in distributed brain areas.
Collapse
|
22
|
Informational Masking in Normal-Hearing and Hearing-Impaired Listeners Measured in a Nonspeech Pattern Identification Task. Trends Hear 2016; 20:2331216516638516. [PMID: 27059627 PMCID: PMC4871212 DOI: 10.1177/2331216516638516] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Revised: 01/26/2016] [Accepted: 02/16/2016] [Indexed: 11/16/2022] Open
Abstract
Individuals with sensorineural hearing loss (SNHL) often experience more difficulty with listening in multisource environments than do normal-hearing (NH) listeners. While the peripheral effects of sensorineural hearing loss certainly contribute to this difficulty, differences in central processing of auditory information may also contribute. To explore this issue, it is important to account for peripheral differences between NH and these hearing-impaired (HI) listeners so that central effects in multisource listening can be examined. In the present study, NH and HI listeners performed a tonal pattern identification task at two distant center frequencies (CFs), 850 and 3500 Hz. In an attempt to control for differences in the peripheral representations of the stimuli, the patterns were presented at the same sensation level (15 dB SL), and the frequency deviation of the tones comprising the patterns was adjusted to obtain equal quiet pattern identification performance across all listeners at both CFs. Tonal sequences were then presented at both CFs simultaneously (informational masking conditions), and listeners were asked either to selectively attend to a source (CF) or to divide attention between CFs and identify the pattern at a CF designated after each trial. There were large differences between groups in the frequency deviations necessary to perform the pattern identification task. After compensating for these differences, there were small differences between NH and HI listeners in the informational masking conditions. HI listeners showed slightly greater performance asymmetry between the low and high CFs than did NH listeners, possibly due to central differences in frequency weighting between groups.
Collapse
|
23
|
Autonomic Nervous System Responses During Perception of Masked Speech may Reflect Constructs other than Subjective Listening Effort. Front Psychol 2016; 7:263. [PMID: 26973564 PMCID: PMC4772584 DOI: 10.3389/fpsyg.2016.00263] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 02/10/2016] [Indexed: 12/05/2022] Open
Abstract
Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners’ subjective perception of task demand were comparable across the three degraded conditions.
Collapse
|
24
|
Abstract
Whereas the energetic and informational masking effects of unintelligible babble on auditory speech recognition are well established, the present study is the first to investigate its effects on visual speech recognition. Young and older adults performed two lipreading tasks while simultaneously experiencing either quiet, speech-shaped noise, or 6-talker background babble. Both words at the end of uninformative carrier sentences and key words in everyday sentences were harder to lipread in the presence of babble than in the presence of speech-shaped noise or quiet. Contrary to the inhibitory deficit hypothesis of cognitive aging, babble had equivalent effects on young and older adults. In a follow-up experiment, neither the babble nor the speech-shaped noise stimuli interfered with performance of a face-processing task, indicating that babble selectively interferes with visual speech recognition and not with visual perception tasks per se. The present results demonstrate that babble can produce cross-modal informational masking and suggest a breakdown in audiovisual scene analysis, either because of obligatory monitoring of even uninformative speech sounds or because of obligatory efforts to integrate speech sounds even with uncorrelated mouth movements.
Collapse
|
25
|
Abstract
The reported studies have aimed to investigate whether informational masking in a multi-talker background relies on semantic interference between the background and target using an adapted semantic priming paradigm. In 3 experiments, participants were required to perform a lexical decision task on a target item embedded in backgrounds composed of 1–4 voices. These voices were Semantically Consistent (SC) voices (i.e., pronouncing words sharing semantic features with the target) or Semantically Inconsistent (SI) voices (i.e., pronouncing words semantically unrelated to each other and to the target). In the first experiment, backgrounds consisted of 1 or 2 SC voices. One and 2 SI voices were added in Experiments 2 and 3, respectively. The results showed a semantic priming effect only in the conditions where the number of SC voices was greater than the number of SI voices, suggesting that semantic priming depended on prime intelligibility and strategic processes. However, even if backgrounds were composed of 3 or 4 voices, reducing intelligibility, participants were able to recognize words from these backgrounds, although no semantic priming effect on the targets was observed. Overall this finding suggests that informational masking can occur at a semantic level if intelligibility is sufficient. Based on the Effortfulness Hypothesis, we also suggest that when there is an increased difficulty in extracting target signals (caused by a relatively high number of voices in the background), more cognitive resources were allocated to formal processes (i.e., acoustic and phonological), leading to a decrease in available resources for deeper semantic processing of background words, therefore preventing semantic priming from occurring.
Collapse
|
26
|
Did you hear that? The role of stimulus similarity and uncertainty in auditory change deafness. Front Psychol 2014; 5:1125. [PMID: 25324821 PMCID: PMC4183091 DOI: 10.3389/fpsyg.2014.01125] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 09/16/2014] [Indexed: 11/22/2022] Open
Abstract
Change deafness, the auditory analog to change blindness, occurs when salient, and behaviorally relevant changes to sound sources are missed. Missing significant changes in the environment can have serious consequences, however, this effect, has remained little more than a lab phenomenon and a party trick. It is only recently that researchers have begun to explore the nature of these profound errors in change perception. Despite a wealth of examples of the change blindness phenomenon, work on change deafness remains fairly limited. The purpose of the current paper is to review the state of the literature on change deafness and propose an explanation of change deafness that relies on factors related to stimulus information rather than attentional or memory limits. To achieve this, work on across several auditory research domains, including environmental sound classification, informational masking, and change deafness are synthesized to present a unified perspective on the perception of change errors in complex, dynamic sound environments. We hope to extend previous research by describing how it may be possible to predict specific patters of change perception errors based on varying degrees of similarity in stimulus features and uncertainty about which stimuli and features are important for a given perceptual decision.
Collapse
|
27
|
The influence of non-native language proficiency on speech perception performance. Front Psychol 2014; 5:651. [PMID: 25071630 PMCID: PMC4078910 DOI: 10.3389/fpsyg.2014.00651] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 06/07/2014] [Indexed: 11/13/2022] Open
Abstract
The present study examined to what extent proficiency in a non-native language influences speech perception in noise. We explored how English proficiency affected native (Swedish) and non-native (English) speech perception in four speech reception threshold (SRT) conditions, including two energetic (stationary, fluctuating noise) and two informational (two-talker babble Swedish, two-talker babble English) maskers. Twenty-three normal-hearing native Swedish listeners participated, age between 28 and 64 years. The participants also performed standardized tests in English proficiency, non-verbal reasoning and working memory capacity. Our approach with focus on proficiency and the assessment of external as well as internal, listener-related factors allowed us to examine which variables explained intra- and interindividual differences in native and non-native speech perception performance. The main result was that in the non-native target, the level of English proficiency is a decisive factor for speech intelligibility in noise. High English proficiency improved performance in all four conditions when the target language was English. The informational maskers were interfering more with perception than energetic maskers, specifically in the non-native target. The study also confirmed that the SRT’s were better when target language was native compared to non-native.
Collapse
|
28
|
Comparison of informational vs. energetic masking effects on speechreading performance. Front Psychol 2014; 5:639. [PMID: 25009520 PMCID: PMC4068195 DOI: 10.3389/fpsyg.2014.00639] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 06/04/2014] [Indexed: 11/13/2022] Open
Abstract
The effects of two types of auditory distracters (steady-state noise vs. four-talker babble) on visual-only speechreading accuracy were tested against a baseline (silence) in 23 participants with above-average speechreading ability. Their task was to speechread high frequency Swedish words. They were asked to rate their own performance and effort, and report how distracting each type of auditory distracter was. Only four-talker babble impeded speechreading accuracy. This suggests competition for phonological processing, since the four-talker babble demands phonological processing, which is also required for the speechreading task. Better accuracy was associated with lower self-rated effort in silence; no other correlations were found.
Collapse
|
29
|
Increase in speech recognition due to linguistic mismatch between target and masker speech: monolingual and simultaneous bilingual performance. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1089-97. [PMID: 24167230 PMCID: PMC4043956 DOI: 10.1044/2013_jslhr-h-12-0378] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
PURPOSE To examine whether improved speech recognition during linguistically mismatched target-masker experiments is due to linguistic unfamiliarity of the masker speech or linguistic dissimilarity between the target and masker speech. METHOD Monolingual English speakers (n = 20) and English-Greek simultaneous bilinguals (n = 20) listened to English sentences in the presence of competing English and Greek speech. Data were analyzed using mixed-effects regression models to determine differences in English recogition performance between the 2 groups and 2 masker conditions. RESULTS Results indicated that English sentence recognition for monolinguals and simultaneous English-Greek bilinguals improved when the masker speech changed from competing English to competing Greek speech. CONCLUSION The improvement in speech recognition that has been observed for linguistically mismatched target-masker experiments cannot be simply explained by the masker language being linguistically unknown or unfamiliar to the listeners. Listeners can improve their speech recognition in linguistically mismatched target-masker experiments even when the listener is able to obtain meaningful linguistic information from the masker speech.
Collapse
|
30
|
Voice-associated static face image releases speech from informational masking. Psych J 2014; 3:113-20. [PMID: 26271763 DOI: 10.1002/pchj.45] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 11/07/2013] [Indexed: 11/08/2022]
Abstract
In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream.
Collapse
|
31
|
Informational masking and spatial hearing in listeners with and without unilateral hearing loss. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:511-531. [PMID: 22215037 PMCID: PMC3320681 DOI: 10.1044/1092-4388(2011/10-0205)] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE This study assessed selective listening for speech in individuals with and without unilateral hearing loss (UHL) and the potential relationship between spatial release from informational masking and localization ability in listeners with UHL. METHOD Twelve adults with UHL and 12 normal-hearing controls completed a series of monaural and binaural speech tasks that were designed to measure informational masking. They also completed a horizontal localization task. RESULTS Monaural performance by participants with UHL was comparable to that of normal-hearing participants. Unlike the normal-hearing participants, the participants with UHL did not exhibit a true spatial release from informational masking. Rather, their performance could be predicted by head shadow effects. Performance among participants with UHL in the localization task was quite variable, with some showing near-normal abilities and others demonstrating no localization ability. CONCLUSION Individuals with UHL did not show deficits in all listening situations but were at a significant disadvantage when listening to speech in environments where normal-hearing listeners benefit from spatial separation between target and masker. This inability to capitalize on spatial cues for selective listening does not appear to be related to localization ability.
Collapse
|
32
|
Effects of the rate of formant-frequency variation on the grouping of formants in speech perception. J Assoc Res Otolaryngol 2012; 13:269-280. [PMID: 22160754 PMCID: PMC3298615 DOI: 10.1007/s10162-011-0307-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2011] [Accepted: 11/18/2011] [Indexed: 11/29/2022] Open
Abstract
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1 + F2 + F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3C; F2 + F3), where F2C + F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0 = constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Collapse
|
33
|
Abstract
OBJECTIVE To investigate the contributions of energetic and informational masking to neural encoding and perception in noise, using oddball discrimination and sentence recognition tasks. DESIGN P3 auditory evoked potential, behavioral discrimination, and sentence recognition data were recorded in response to speech and tonal signals presented to nine normal-hearing adults. Stimuli were presented at a signal to noise ratio of -3 dB in four background conditions: quiet, continuous noise, intermittent noise, and four-talker babble. RESULTS Responses to tonal signals were not significantly different for the three maskers. However, responses to speech signals in the four-talker babble resulted in longer P3 latencies, smaller P3 amplitudes, poorer discrimination accuracy, and longer reaction times than in any of the other conditions. Results also demonstrate significant correlations between physiological and behavioral data. As latency of the P3 increased, reaction times also increased and sentence recognition scores decreased. CONCLUSION The data confirm a differential effect of masker type on the P3 and behavioral responses and present evidence of interference by an informational masker to speech understanding at the level of the cortex. Results also validate the use of the P3 as a useful measure to demonstrate physiological correlates of informational masking.
Collapse
|
34
|
Abstract
Auditory perception and cognition entails both low-level and high-level processes, which are likely to interact with each other to create our rich conscious experience of soundscapes. Recent research that we review has revealed numerous influences of high-level factors, such as attention, intention, and prior experience, on conscious auditory perception. And recently, studies have shown that auditory scene analysis tasks can exhibit multistability in a manner very similar to ambiguous visual stimuli, presenting a unique opportunity to study neural correlates of auditory awareness and the extent to which mechanisms of perception are shared across sensory modalities. Research has also led to a growing number of techniques through which auditory perception can be manipulated and even completely suppressed. Such findings have important consequences for our understanding of the mechanisms of perception and also should allow scientists to precisely distinguish the influences of different higher-level influences.
Collapse
|
35
|
Similarity and familiarity: Second language sentence recognition in first- and second-language multi-talker babble. SPEECH COMMUNICATION 2010; 52:943-953. [PMID: 21179561 PMCID: PMC3003927 DOI: 10.1016/j.specom.2010.05.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The intelligibility of speech in noisy environments depends not only on the functionality of listeners' peripheral auditory systems, but also on cognitive factors such as their language learning experience. Previous studies have shown, for example, that normal-hearing listeners attending to a non-native language have more difficulty identifying speech targets in noisy conditions than do native listeners. Furthermore, native listeners have more difficulty understanding speech targets in the presence of speech noise in their native language versus a foreign language. The present study addresses the role of listener language experience with both the target and noise languages by examining second-language sentence recognition in first- and second-language noise. Native English speakers and non-native English speakers whose native language is Mandarin were tested on English sentence recognition in English and Mandarin 2-talker babble. Results show that both listener groups experienced greater difficulty in English versus Mandarin babble, but that native Mandarin listeners experienced a smaller release from masking in Mandarin babble relative to English babble. These results indicate that both the similarity between the target and noise and the language experience of the listeners contribute to the amount of interference listeners experience when listening to speech in the presence of speech noise.
Collapse
|
36
|
Psychophysical spectro-temporal receptive fields in an auditory task. Hear Res 2009; 251:1-9. [PMID: 19249339 PMCID: PMC2692227 DOI: 10.1016/j.heares.2009.02.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Revised: 02/12/2009] [Accepted: 02/13/2009] [Indexed: 11/19/2022]
Abstract
Psychophysical relative weighting functions, which provide information about the importance of different regions of a stimulus in forming decisions, are traditionally estimated using trial-based procedures, where a single stimulus is presented and a single response is recorded. Everyday listening is much more "free-running" in that we often must detect randomly occurring signals in the presence of a continuous background. Psychophysical relative weighting functions have not been measured with free-running paradigms. Here, we combine a free-running paradigm with the reverse correlation technique used to estimate physiological spectro-temporal receptive fields (STRFs) to generate psychophysical relative weighting functions that are analogous to physiological STRFs. The psychophysical task required the detection of a fixed target signal (a sequence of spectro-temporally coherent tone pips with a known frequency) in the presence of a continuously presented informational masker (spectro-temporally random tone pips). A comparison of psychophysical relative weighting functions estimated with the current free-running paradigm and trial-based paradigms, suggests that in informational-masking tasks subjects' decision strategies are similar in both free-running and trial-based paradigms. For more cognitively challenging tasks there may be differences in the decision strategies with free-running and trial-based paradigms.
Collapse
|