1
|
Divided listening in the free field becomes asymmetric when acoustic cues are limited. Hear Res 2022; 416:108444. [DOI: 10.1016/j.heares.2022.108444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/10/2022] [Accepted: 01/16/2022] [Indexed: 11/23/2022]
|
2
|
Venezia JH, Leek MR, Lindeman MP. Suprathreshold Differences in Competing Speech Perception in Older Listeners With Normal and Impaired Hearing. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:2141-2161. [PMID: 32603618 DOI: 10.1044/2020_jslhr-19-00324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose Age-related declines in auditory temporal processing and cognition make older listeners vulnerable to interference from competing speech. This vulnerability may be increased in older listeners with sensorineural hearing loss due to additional effects of spectral distortion and accelerated cognitive decline. The goal of this study was to uncover differences between older hearing-impaired (OHI) listeners and older normal-hearing (ONH) listeners in the perceptual encoding of competing speech signals. Method Age-matched groups of 10 OHI and 10 ONH listeners performed the coordinate response measure task with a synthetic female target talker and a male competing talker at a target-to-masker ratio of +3 dB. Individualized gain was provided to OHI listeners. Each listener completed 50 baseline and 800 "bubbles" trials in which randomly selected segments of the speech modulation power spectrum (MPS) were retained on each trial while the remainder was filtered out. Average performance was fixed at 50% correct by adapting the number of segments retained. Multinomial regression was used to estimate weights showing the regions of the MPS associated with performance (a "classification image" or CImg). Results The CImg weights were significantly different between the groups in two MPS regions: a region encoding the shared phonetic content of the two talkers and a region encoding the competing (male) talker's voice. The OHI listeners demonstrated poorer encoding of the phonetic content and increased vulnerability to interference from the competing talker. Individual differences in CImg weights explained over 75% of the variance in baseline performance in the OHI listeners, whereas differences in high-frequency pure-tone thresholds explained only 10%. Conclusion Suprathreshold deficits in the encoding of low- to mid-frequency (~5-10 Hz) temporal modulations-which may reflect poorer "dip listening"-and auditory grouping at a perceptual and/or cognitive level are responsible for the relatively poor performance of OHI versus ONH listeners on a different-gender competing speech task. Supplemental Material https://doi.org/10.23641/asha.12568472.
Collapse
Affiliation(s)
- Jonathan H Venezia
- VA Loma Linda Healthcare System, CA
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Loma Linda University, CA
| | - Marjorie R Leek
- VA Loma Linda Healthcare System, CA
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Loma Linda University, CA
| | | |
Collapse
|
3
|
Speech Perception with Spectrally Non-overlapping Maskers as Measure of Spectral Resolution in Cochlear Implant Users. J Assoc Res Otolaryngol 2018; 20:151-167. [PMID: 30456730 DOI: 10.1007/s10162-018-00702-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 10/07/2018] [Indexed: 10/27/2022] Open
Abstract
Poor spectral resolution contributes to the difficulties experienced by cochlear implant (CI) users when listening to speech in noise. However, correlations between measures of spectral resolution and speech perception in noise have not always been found to be robust. It may be that the relationship between spectral resolution and speech perception in noise becomes clearer in conditions where the speech and noise are not spectrally matched, so that improved spectral resolution can assist in separating the speech from the masker. To test this prediction, speech intelligibility was measured with noise or tone maskers that were presented either in the same spectral channels as the speech or in interleaved spectral channels. Spectral resolution was estimated via a spectral ripple discrimination task. Results from vocoder simulations in normal-hearing listeners showed increasing differences in speech intelligibility between spectrally overlapped and interleaved maskers as well as improved spectral ripple discrimination with increasing spectral resolution. However, no clear differences were observed in CI users between performance with spectrally interleaved and overlapped maskers, or between tone and noise maskers. The results suggest that spectral resolution in current CIs is too poor to take advantage of the spectral separation produced by spectrally interleaved speech and maskers. Overall, the spectrally interleaved and tonal maskers produce a much larger difference in performance between normal-hearing listeners and CI users than do traditional speech-in-noise measures, and thus provide a more sensitive test of speech perception abilities for current and future implantable devices.
Collapse
|
4
|
Marrufo-Pérez MI, Eustaquio-Martín A, Lopez-Poveda EA. Adaptation to Noise in Human Speech Recognition Unrelated to the Medial Olivocochlear Reflex. J Neurosci 2018; 38:4138-4145. [PMID: 29593051 PMCID: PMC6596031 DOI: 10.1523/jneurosci.0024-18.2018] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 02/26/2018] [Accepted: 03/24/2018] [Indexed: 11/21/2022] Open
Abstract
Sensory systems constantly adapt their responses to the current environment. In hearing, adaptation may facilitate communication in noisy settings, a benefit frequently (but controversially) attributed to the medial olivocochlear reflex (MOCR) enhancing the neural representation of speech. Here, we show that human listeners (N = 14; five male) recognize more words presented monaurally in ipsilateral, contralateral, and bilateral noise when they are given some time to adapt to the noise. This finding challenges models and theories that claim that speech intelligibility in noise is invariant over time. In addition, we show that this adaptation to the noise occurs also for words processed to maintain the slow-amplitude modulations in speech (the envelope) disregarding the faster fluctuations (the temporal fine structure). This demonstrates that noise adaptation reflects an enhancement of amplitude modulation speech cues and is unaffected by temporal fine structure cues. Last, we show that cochlear implant users (N = 7; four male) show normal monaural adaptation to ipsilateral noise. Because the electrical stimulation delivered by cochlear implants is independent from the MOCR, this demonstrates that noise adaptation does not require the MOCR. We argue that noise adaptation probably reflects adaptation of the dynamic range of auditory neurons to the noise level statistics.SIGNIFICANCE STATEMENT People find it easier to understand speech in noisy environments when they are given some time to adapt to the noise. This benefit is frequently but controversially attributed to the medial olivocochlear efferent reflex enhancing the representation of speech cues in the auditory nerve. Here, we show that the adaptation to noise reflects an enhancement of the slow fluctuations in amplitude over time that are present in speech. In addition, we show that adaptation to noise for cochlear implant users is not statistically different from that for listeners with normal hearing. Because the electrical stimulation delivered by cochlear implants is independent from the medial olivocochlear efferent reflex, this demonstrates that adaptation to noise does not require this reflex.
Collapse
Affiliation(s)
- Miriam I Marrufo-Pérez
- Instituto de Neurociencias de Castilla y León
- Instituto de Investigación Biomédica de Salamanca, and
| | | | - Enrique A Lopez-Poveda
- Instituto de Neurociencias de Castilla y León,
- Instituto de Investigación Biomédica de Salamanca, and
- Departamento de Cirugía, Facultad de Medicina, Universidad de Salamanca, 37007 Salamanca, Spain
| |
Collapse
|
5
|
Humes LE, Kidd GR. Speech recognition for multiple bands: Implications for the Speech Intelligibility Index. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2019. [PMID: 27914446 PMCID: PMC6909976 DOI: 10.1121/1.4962539] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The Speech Intelligibility Index (SII) assumes additivity of the importance of acoustically independent bands of speech. To further evaluate this assumption, open-set speech recognition was measured for words and sentences, in quiet and in noise, when the speech stimuli were presented to the listener in selected frequency bands. The filter passbands were constructed from various combinations of 20 bands having equivalent (0.05) importance in the SII framework. This permitted the construction of a variety of equal-SII band patterns that were then evaluated by nine different groups of young adults with normal hearing. For monosyllabic words, a similar dependence on band pattern was observed for SII values of 0.4, 0.5, and 0.6 in both quiet and noise conditions. Specifically, band patterns concentrated toward the lower and upper frequency range tended to yield significantly lower scores than those more evenly sampling a broader frequency range. For all stimuli and test conditions, equal SII values did not yield equal performance. Because the spectral distortions of speech evaluated here may not commonly occur in everyday listening conditions, this finding does not necessarily represent a serious deficit for the application of the SII. These findings, however, challenge the band-independence assumption of the theory underlying the SII.
Collapse
Affiliation(s)
- Larry E Humes
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405-7002, USA
| | - Gary R Kidd
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405-7002, USA
| |
Collapse
|
6
|
Apoux F, Youngdahl CL, Yoho SE, Healy EW. Dual-carrier processing to convey temporal fine structure cues: Implications for cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:1469-80. [PMID: 26428784 PMCID: PMC4575322 DOI: 10.1121/1.4928136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Revised: 07/22/2015] [Accepted: 07/23/2015] [Indexed: 05/26/2023]
Abstract
Speech intelligibility in noise can be degraded by using vocoder processing to alter the temporal fine structure (TFS). Here it is argued that this degradation is not attributable to the loss of speech information potentially present in the TFS. Instead it is proposed that the degradation results from the loss of sound-source segregation information when two or more carriers (i.e., TFS) are substituted with only one as a consequence of vocoder processing. To demonstrate this segregation role, vocoder processing involving two carriers, one for the target and one for the background, was implemented. Because this approach does not preserve the speech TFS, it may be assumed that any improvement in intelligibility can only be a consequence of the preserved carrier duality and associated segregation cues. Three experiments were conducted using this "dual-carrier" approach. All experiments showed substantial sentence intelligibility in noise improvements compared to traditional single-carrier conditions. In several conditions, the improvement was so substantial that intelligibility approximated that for unprocessed speech in noise. A foreseeable and potentially promising implication for the dual-carrier approach involves implementation into cochlear implant speech processors, where it may provide the TFS cues necessary to segregate speech from noise.
Collapse
Affiliation(s)
- Frédéric Apoux
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Carla L Youngdahl
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Sarah E Yoho
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
7
|
Healy EW, Youngdahl CL, Apoux F. Evidence for independent time-unit processing of speech using noise promoting or suppressing masking release (L). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:581-4. [PMID: 25234867 PMCID: PMC3985896 DOI: 10.1121/1.4861363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Revised: 07/31/2013] [Accepted: 12/23/2013] [Indexed: 05/28/2023]
Abstract
The relative independence of time-unit processing during speech reception was examined. It was found that temporally interpolated noise, even at very high levels, had little effect on sentence recognition using masking-release conditions similar to those of Kwon et al. [(2012). J. Acoust. Soc. Am. 131, 3111-3119]. The current data confirm the earlier conclusions of Kwon et al. involving masking release based on the relative timing of speech and noise. These data also indicate substantial levels of independence in the time domain, which has implications for current theories of speech perception in noise.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210
| | - Carla L Youngdahl
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210
| | - Frédéric Apoux
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210
| |
Collapse
|
8
|
Healy EW, Yoho SE, Wang Y, Wang D. An algorithm to improve speech recognition in noise for hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:3029-38. [PMID: 24116438 PMCID: PMC3799726 DOI: 10.1121/1.4820893] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 08/22/2013] [Accepted: 08/26/2013] [Indexed: 05/09/2023]
Abstract
Despite considerable effort, monaural (single-microphone) algorithms capable of increasing the intelligibility of speech in noise have remained elusive. Successful development of such an algorithm is especially important for hearing-impaired (HI) listeners, given their particular difficulty in noisy backgrounds. In the current study, an algorithm based on binary masking was developed to separate speech from noise. Unlike the ideal binary mask, which requires prior knowledge of the premixed signals, the masks used to segregate speech from noise in the current study were estimated by training the algorithm on speech not used during testing. Sentences were mixed with speech-shaped noise and with babble at various signal-to-noise ratios (SNRs). Testing using normal-hearing and HI listeners indicated that intelligibility increased following processing in all conditions. These increases were larger for HI listeners, for the modulated background, and for the least-favorable SNRs. They were also often substantial, allowing several HI listeners to improve intelligibility from scores near zero to values above 70%.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210
| | | | | | | |
Collapse
|
9
|
Apoux F, Yoho SE, Youngdahl CL, Healy EW. Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2205-12. [PMID: 23967950 PMCID: PMC3765279 DOI: 10.1121/1.4816413] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The present study investigated the role and relative contribution of envelope and temporal fine structure (TFS) to sentence recognition in noise. Target and masker stimuli were added at five different signal-to-noise ratios (SNRs) and filtered into 30 contiguous frequency bands. The envelope and TFS were extracted from each band by Hilbert decomposition. The final stimuli consisted of the envelope of the target/masker sound mixture at x dB SNR and the TFS of the same sound mixture at y dB SNR. A first experiment showed a very limited contribution of TFS cues, indicating that sentence recognition in noise relies almost exclusively on temporal envelope cues. A second experiment showed that replacing the carrier of a sound mixture with noise (vocoder processing) cannot be considered equivalent to disrupting the TFS of the target signal by adding a background noise. Accordingly, a re-evaluation of the vocoder approach as a model to further understand the role of TFS cues in noisy situations may be necessary. Overall, these data are consistent with the view that speech information is primarily extracted from the envelope while TFS cues are primarily used to detect glimpses of the target.
Collapse
Affiliation(s)
- Frédéric Apoux
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA.
| | | | | | | |
Collapse
|
10
|
Best V, Thompson ER, Mason CR, Kidd G. Spatial release from masking as a function of the spectral overlap of competing talkers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:3677-3680. [PMID: 23742322 PMCID: PMC3689785 DOI: 10.1121/1.4803517] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2013] [Revised: 03/28/2013] [Accepted: 04/08/2013] [Indexed: 05/29/2023]
Abstract
This study tested the hypothesis that the reduced spatial release from speech-on-speech masking typically observed in listeners with sensorineural hearing loss results from increased energetic masking. Target sentences were presented simultaneously with a speech masker, and the spectral overlap between the pair (and hence the energetic masking) was systematically varied. The results are consistent with increased energetic masking in listeners with hearing loss that limits performance when listening in speech mixtures. However, listeners with hearing loss did not exhibit reduced spatial release from masking when stimuli were filtered into narrow bands.
Collapse
Affiliation(s)
- Virginia Best
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| | | | | | | |
Collapse
|
11
|
Healy EW, Yoho SE, Apoux F. Band importance for sentences and words reexamined. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:463-73. [PMID: 23297918 PMCID: PMC3548885 DOI: 10.1121/1.4770246] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Band-importance functions were created using the "compound" technique [Apoux and Healy, J. Acoust. Soc. Am. 132, 1078-1087 (2012)] that accounts for the multitude of synergistic and redundant interactions that take place among speech bands. Functions were created for standard recordings of the speech perception in noise (SPIN) sentences and the Central Institute for the Deaf (CID) W-22 words using 21 critical-band divisions and steep filtering to eliminate the influence of filter slopes. On a given trial, a band of interest was presented along with four other bands having spectral locations determined randomly on each trial. In corresponding trials, the band of interest was absent and only the four other bands were present. The importance of the band of interest was determined by the difference between paired band-present and band-absent trials. Because the locations of the other bands changed randomly from trial to trial, various interactions occurred between the band of interest and other speech bands which provided a general estimate of band importance. Obtained band-importance functions differed substantially from those currently available for identical speech recordings. In addition to differences in the overall shape of the functions, especially for the W-22 words, a complex microstructure was observed in which the importance of adjacent frequency bands often varied considerably. This microstructure may result in better predictive power of the current functions.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA.
| | | | | |
Collapse
|
12
|
Apoux F, Healy EW. Use of a compound approach to derive auditory-filter-wide frequency-importance functions for vowels and consonants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1078-87. [PMID: 22894227 PMCID: PMC3427368 DOI: 10.1121/1.4730905] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Revised: 05/14/2012] [Accepted: 06/01/2012] [Indexed: 05/26/2023]
Abstract
Speech recognition in noise presumably relies on the number and spectral location of available auditory-filter outputs containing a relatively undistorted view of local target signal properties. The purpose of the present study was to estimate the relative weight of each of the 30 auditory-filter wide bands between 80 and 7563 Hz. Because previous approaches were not compatible with this goal, a technique was developed. Similar to the "hole" approach, the weight of a given band was assessed by comparing intelligibility in two conditions differing in only one aspect-the presence or absence of the band of interest. In contrast to the hole approach, however, random gaps were also created in the spectrum. These gaps were introduced to render the auditory system more sensitive to the removal of a single band and their location was randomized to provide a general view of the weight of each band, i.e., irrespective of the location of information elsewhere in the spectrum. Frequency-weighting functions derived using this technique confirmed the main contribution of the 400-2500 Hz frequency region. However, they revealed a complex microstructure, contrasting with the "bell curve" shape typically reported.
Collapse
Affiliation(s)
- Frédéric Apoux
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA.
| | | |
Collapse
|
13
|
Apoux F, Healy EW. Relative contribution of target and masker temporal fine structure to the unmasking of consonants in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:4044-4052. [PMID: 22225058 PMCID: PMC3253603 DOI: 10.1121/1.3652888] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Revised: 08/02/2011] [Accepted: 09/25/2011] [Indexed: 05/26/2023]
Abstract
The present study assessed the relative contribution of the "target" and "masker" temporal fine structure (TFS) when identifying consonants. Accordingly, the TFS of the target and that of the masker were manipulated simultaneously or independently. A 30 band vocoder was used to replace the original TFS of the stimuli with tones. Four masker types were used. They included a speech-shaped noise, a speech-shaped noise modulated by a speech envelope, a sentence, or a sentence played backward. When the TFS of the target and that of the masker were disrupted simultaneously, consonant recognition dropped significantly compared to the unprocessed condition for all masker types, except the speech-shaped noise. Disruption of only the target TFS led to a significant drop in performance with all masker types. In contrast, disruption of only the masker TFS had no effect on recognition. Overall, the present data are consistent with previous work showing that TFS information plays a significant role in speech recognition in noise, especially when the noise fluctuates over time. However, the present study indicates that listeners rely primarily on TFS information in the target and that the nature of the masker TFS has a very limited influence on the outcome of the unmasking process.
Collapse
Affiliation(s)
- Frédéric Apoux
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA.
| | | |
Collapse
|