1
|
Kates JM, Lavandier M, Muralimanohar RK, Lundberg EMH, Arehart KH. Binaural speech intelligibility for combinations of noise, reverberation, and hearing-aid signal processing. PLoS One 2025; 20:e0317266. [PMID: 39813264 PMCID: PMC11734965 DOI: 10.1371/journal.pone.0317266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 12/25/2024] [Indexed: 01/18/2025] Open
Abstract
Binaural speech intelligibility in rooms is a complex process that is affected by many factors including room acoustics, hearing loss, and hearing aid (HA) signal processing. Intelligibility is evaluated in this paper for a simulated room combined with a simulated hearing aid. The test conditions comprise three spatial configurations of the speech and noise sources, simulated anechoic and concert hall acoustics, three amounts of multitalker babble interference, the hearing status of the listeners, and three degrees of simulated HA processing provided to compensate for the noise and/or hearing loss. The impact of these factors and their interactions is considered for normal-hearing (NH) and hearing-impaired (HI) listeners for sentence stimuli. Both listener groups showed a significant reduction in intelligibility as the signal-to-noise ratio (SNR) decreased, and showed a reduction in intelligibility in reverberation when compared to anechoic listening. There was no significant improvement in intelligibility for the NH group for the noise suppression algorithm used here, and no significant improvement in intelligibility for the HI group for more advanced HA processing algorithms as opposed to linear amplification in either of the two acoustic spaces or at any of the three SNRs.
Collapse
Affiliation(s)
- James M. Kates
- Deptartment of Speech, Language, and Hearing Sciences, University of Colorado, Boulder, Colorado, United States of America
| | - Mathieu Lavandier
- ENPTE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, Vaulx-en-Velin, France
| | - Ramesh Kumar Muralimanohar
- Department of Communication Sciences and Disorders, University of Northern Colorado, Greeley, Colorado, United States of America
| | - Emily M. H. Lundberg
- Deptartment of Speech, Language, and Hearing Sciences, University of Colorado, Boulder, Colorado, United States of America
| | - Kathryn H. Arehart
- Deptartment of Speech, Language, and Hearing Sciences, University of Colorado, Boulder, Colorado, United States of America
| |
Collapse
|
2
|
Cueille R, Lavandier M. Binaural Speech Intelligibility in Noise and Reverberation: Prediction of Group Performance for Normal-hearing and Hearing-impaired Listeners. Trends Hear 2025; 29:23312165251344947. [PMID: 40432370 PMCID: PMC12120292 DOI: 10.1177/23312165251344947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 04/09/2025] [Accepted: 05/01/2025] [Indexed: 05/29/2025] Open
Abstract
A binaural model is proposed to predict speech intelligibility in rooms for normal-hearing (NH) and hearing-impaired listener groups, combining the advantages of two existing models. The leclere2015 model takes binaural room impulse responses (BRIRs) as inputs and accounts for the temporal smearing of the speech by reverberation, but only works with stationary noises for NH listeners. The vicente2020 model takes the speech and noise signals at the ears as well as the listener audiogram as inputs and accounts for modulations in the noise and hearing loss, but cannot predict the temporal smearing of the speech by reverberation. The new model takes the audiogram, BRIRs and ear signals as inputs to account for the temporal smearing of the speech, the masker modulations and hearing loss. It gave accurate predictions for speech reception thresholds measured in seven experiments. The proposed model can do predictions that neither of the two original models can make when the target speech is influenced by reverberation and the noise has modulations and/or the listeners have hearing loss. In terms of model parameters, four methods were compared to separate the early and late reverberation, and two methods were compared to account for hearing loss.
Collapse
Affiliation(s)
- Raphael Cueille
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, Vaulx-en-Velin, France
- CRNL, UMR CNRS 5292, Univ. Lyon 1, Lyon, France
| | | |
Collapse
|
3
|
Thoidis I, Goehring T. Using deep learning to improve the intelligibility of a target speaker in noisy multi-talker environments for people with normal hearing and hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:706-724. [PMID: 39082692 DOI: 10.1121/10.0028007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/01/2024] [Indexed: 03/28/2025]
Abstract
Understanding speech in noisy environments is a challenging task, especially in communication situations with several competing speakers. Despite their ongoing improvement, assistive listening devices and speech processing approaches still do not perform well enough in noisy multi-talker environments, as they may fail to restore the intelligibility of a speaker of interest among competing sound sources. In this study, a quasi-causal deep learning algorithm was developed that can extract the voice of a target speaker, as indicated by a short enrollment utterance, from a mixture of multiple concurrent speakers in background noise. Objective evaluation with computational metrics demonstrated that the speaker-informed algorithm successfully extracts the target speaker from noisy multi-talker mixtures. This was achieved using a single algorithm that generalized to unseen speakers, different numbers of speakers and relative speaker levels, and different speech corpora. Double-blind sentence recognition tests on mixtures of one, two, and three speakers in restaurant noise were conducted with listeners with normal hearing and listeners with hearing loss. Results indicated significant intelligibility improvements with the speaker-informed algorithm of 17% and 31% for people without and with hearing loss, respectively. In conclusion, it was demonstrated that deep learning-based speaker extraction can enhance speech intelligibility in noisy multi-talker environments where uninformed speech enhancement methods fail.
Collapse
Affiliation(s)
- Iordanis Thoidis
- School of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Tobias Goehring
- Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
4
|
Rennies J, Warzybok A, Kollmeier B, Brand T. Spatio-temporal Integration of Speech Reflections in Hearing-Impaired Listeners. Trends Hear 2022; 26:23312165221143901. [PMID: 36537084 PMCID: PMC9772954 DOI: 10.1177/23312165221143901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Speech recognition in rooms requires the temporal integration of reflections which arrive with a certain delay after the direct sound. It is commonly assumed that there is a certain temporal window of about 50-100 ms, during which reflections can be integrated with the direct sound, while later reflections are detrimental to speech intelligibility. This concept was challenged in a recent study by employing binaural room impulse responses (RIRs) with systematically varied interaural phase differences (IPDs) and amplitude of the direct sound and a variable number of reflections delayed by up to 200 ms. When amplitude or IPD favored late RIR components, normal-hearing (NH) listeners appeared to be capable of focusing on these components rather than on the precedent direct sound, which contrasted with the common concept of considering early RIR components as useful and late components as detrimental. The present study investigated speech intelligibility in the same conditions in hearing-impaired (HI) listeners. The data indicate that HI listeners were generally less able to "ignore" the direct sound than NH listeners, when the most useful information was confined to late RIR components. Some HI listeners showed a remarkable inability to integrate across multiple reflections and to optimally "shift" their temporal integration window, which was quite dissimilar to NH listeners. This effect was most pronounced in conditions requiring spatial and temporal integration and could provide new challenges for individual prediction models of binaural speech intelligibility.
Collapse
Affiliation(s)
- Jan Rennies
- Fraunhofer Institute for Digital Media Technology IDMT, Project Group Hearing, Speech and Audio Technology, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany,Jan Rennies, Fraunhofer Institute for Digital Media Technology IDMT, Deparment for Hearing, Speech and Audio Technology, Marie-Curie-Str. 2, Oldenburg, Niedersachsen 26129, Germany.
| | - Anna Warzybok
- Medical Physics Group, Department für Medizinische Physik und Akustik, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Birger Kollmeier
- Fraunhofer Institute for Digital Media Technology IDMT, Project Group Hearing, Speech and Audio Technology, Oldenburg, Germany,Medical Physics Group, Department für Medizinische Physik und Akustik, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Thomas Brand
- Medical Physics Group, Department für Medizinische Physik und Akustik, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany
| |
Collapse
|
5
|
Prud'homme L, Lavandier M, Best V. Investigating the role of harmonic cancellation in speech-on-speech masking. Hear Res 2022; 426:108562. [PMID: 35768309 PMCID: PMC9722527 DOI: 10.1016/j.heares.2022.108562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 06/15/2022] [Indexed: 11/30/2022]
Abstract
This study investigated the role of harmonic cancellation in the intelligibility of speech in "cocktail party" situations. While there is evidence that harmonic cancellation plays a role in the segregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing non-stationary F0s and unvoiced segments is unclear. Here we focused on the energetic masking of speech targets caused by competing speech maskers. Speech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmonic complexes, monotonized speech, noise-vocoded speech, reversed speech and natural speech. These maskers enabled an estimate of how the masking potential of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. Measured speech reception thresholds were compared to the predictions of two computational models, with and without a harmonic cancellation component. Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixtures.
Collapse
Affiliation(s)
- Luna Prud'homme
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| |
Collapse
|
6
|
Wasiuk PA, Buss E, Oleson JJ, Calandruccio L. Predicting speech-in-speech recognition: Short-term audibility, talker sex, and listener factors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3010. [PMID: 36456289 DOI: 10.1121/10.0015228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/01/2022] [Indexed: 06/17/2023]
Abstract
Speech-in-speech recognition can be challenging, and listeners vary considerably in their ability to accomplish this complex auditory-cognitive task. Variability in performance can be related to intrinsic listener factors as well as stimulus factors associated with energetic and informational masking. The current experiments characterized the effects of short-term audibility of the target, differences in target and masker talker sex, and intrinsic listener variables on sentence recognition in two-talker speech and speech-shaped noise. Participants were young adults with normal hearing. Each condition included the adaptive measurement of speech reception thresholds, followed by testing at a fixed signal-to-noise ratio (SNR). Short-term audibility for each keyword was quantified using a computational glimpsing model for target+masker mixtures. Scores on a psychophysical task of auditory stream segregation predicted speech recognition, with stronger effects for speech-in-speech than speech-in-noise. Both speech-in-speech and speech-in-noise recognition depended on the proportion of audible glimpses available in the target+masker mixture, even across stimuli presented at the same global SNR. Short-term audibility requirements varied systematically across stimuli, providing an estimate of the greater informational masking for speech-in-speech than speech-in-noise recognition and quantifying informational masking for matched and mismatched talker sex.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Jacob J Oleson
- Department of Biostatistics, 145 North Riverside Drive, University of Iowa, Iowa City, Iowa 52242, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|