1
|
González-Toledo D, Cuevas-Rodríguez M, Vicente T, Picinali L, Molina-Tanco L, Reyes-Lecuona A. Spatial release from masking in the median plane with non-native speakers using individual and mannequin head related transfer functions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:284-293. [PMID: 38227426 DOI: 10.1121/10.0024239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/12/2023] [Indexed: 01/17/2024]
Abstract
Spatial release from masking (SRM) in speech-on-speech tasks has been widely studied in the horizontal plane, where interaural cues play a fundamental role. Several studies have also observed SRM for sources located in the median plane, where (monaural) spectral cues are more important. However, a relatively unexplored research question concerns the impact of head-related transfer function (HRTF) personalisation on SRM, for example, whether using individually-measured HRTFs results in better performance if compared with the use of mannequin HRTFs. This study compares SRM in the median plane in a speech-on-speech virtual task rendered using both individual and mannequin HRTFs. SRM is obtained using English sentences with non-native English speakers. Our participants show lower SRM performances compared to those found by others using native English participants. Furthermore, SRM is significantly larger when the source is spatialised using the individual HRTF, and this effect is more marked for those with lower English proficiency. Further analyses using a spectral distortion metric and the estimation of the better-ear effect, show that the observed SRM can only partially be explained by HRTF-specific factors and that the effect of the familiarity with individual spatial cues is likely to be the most significant element driving these results.
Collapse
Affiliation(s)
- Daniel González-Toledo
- Telecommunication Research Institute (TELMA), Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| | - María Cuevas-Rodríguez
- Telecommunication Research Institute (TELMA), Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| | - Thibault Vicente
- Audio Experience Design, Dyson School of Design Engineering, Imperial College London, London SW7 2DB, United Kingdom
| | - Lorenzo Picinali
- Audio Experience Design, Dyson School of Design Engineering, Imperial College London, London SW7 2DB, United Kingdom
| | - Luis Molina-Tanco
- Telecommunication Research Institute (TELMA), Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| | - Arcadio Reyes-Lecuona
- Telecommunication Research Institute (TELMA), Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| |
Collapse
|
2
|
Wang J, Xie S, Stenfelt S, Zhou H, Wang X, Sang J. Spatial Release From Masking With Bilateral Bone Conduction Stimulation at Mastoid for Normal Hearing Subjects. Trends Hear 2024; 28:23312165241234202. [PMID: 38549451 PMCID: PMC10981249 DOI: 10.1177/23312165241234202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 02/03/2024] [Accepted: 02/05/2024] [Indexed: 04/01/2024] Open
Abstract
This study investigates the effect of spatial release from masking (SRM) in bilateral bone conduction (BC) stimulation at the mastoid. Nine adults with normal hearing were tested to determine SRM based on speech recognition thresholds (SRTs) in simulated spatial configurations ranging from 0 to 180 degrees. These configurations were based on nonindividualized head-related transfer functions. The participants were subjected to sound stimulation through either air conduction (AC) via headphones or BC. The results indicated that both the angular separation between the target and the masker, and the modality of sound stimulation, significantly influenced speech recognition performance. As the angular separation between the target and the masker increased up to 150°, both BC and AC SRTs decreased, indicating improved performance. However, performance slightly deteriorated when the angular separation exceeded 150°. For spatial separations less than 75°, BC stimulation provided greater spatial benefits than AC, although this difference was not statistically significant. For separations greater than 75°, AC stimulation offered significantly more spatial benefits than BC. When speech and noise originated from the same side of the head, the "better ear effect" did not significantly contribute to SRM. However, when speech and noise were located on opposite sides of the head, this effect became dominant in SRM.
Collapse
Affiliation(s)
- Jie Wang
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, China
| | - Sijia Xie
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, China
| | - Stefan Stenfelt
- Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Huali Zhou
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, China
| | - Xiaoya Wang
- Otolaryngology Department, Guangzhou Women and Children's Medical Center, Guangzhou, China
| | - Jinqiu Sang
- Shanghai Institute of AI for Education, East China Normal University, Shanghai, China
| |
Collapse
|
3
|
Kuntz M, Bischof NF, Seeber BU. Sound field synthesis for psychoacoustic research: In situ evaluation of auralized sound pressure levela). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1882-1895. [PMID: 37756576 DOI: 10.1121/10.0021066] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 08/25/2023] [Indexed: 09/29/2023]
Abstract
The use of virtual acoustic environments has become a key element in psychoacoustic and audiologic research, as loudspeaker-based reproduction offers many advantages over headphones. However, sound field synthesis methods have mostly been evaluated numerically or perceptually in the center, yielding little insight into the achievable accuracy of the reproduced sound field over a wider reproduction area with loudspeakers in a physical, laboratory-standard reproduction setup. Deviations from the ideal free-field and point-source concepts, such as non-ideal frequency response, non-omnidirectional directivity, acoustic reflections, and diffraction on the necessary hardware, impact the generated sound field. We evaluate reproduction accuracy in a 61-loudspeaker setup, the Simulated Open Field Environment, installed in an anechoic chamber. A first measurement following the ISO 8253-2:2009 standard for free-field audiology shows that the required accuracy is reached with critical-band-wide noise. A second measurement characterizes the sound pressure reproduced with the higher-order Ambisonics basic decoder, with and without max rE weighting, vector base amplitude panning, and nearest loudspeaker mapping on a 187 cm × 187 cm reproduction area. We show that the sweet-spot size observed in measured sound fields follows the rule kr≤N/2 rather than kr≤N but is still large enough to avoid compromising psychoacoustic experiments.
Collapse
Affiliation(s)
- Matthieu Kuntz
- Audio Information Processing, Technical University of Munich, 80333 Munich, Germany
| | - Norbert F Bischof
- Audio Information Processing, Technical University of Munich, 80333 Munich, Germany
| | - Bernhard U Seeber
- Audio Information Processing, Technical University of Munich, 80333 Munich, Germany
| |
Collapse
|
4
|
Higgins NC, Pupo DA, Ozmeral EJ, Eddins DA. Head movement and its relation to hearing. Front Psychol 2023; 14:1183303. [PMID: 37448716 PMCID: PMC10338176 DOI: 10.3389/fpsyg.2023.1183303] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/07/2023] [Indexed: 07/15/2023] Open
Abstract
Head position at any point in time plays a fundamental role in shaping the auditory information that reaches a listener, information that continuously changes as the head moves and reorients to different listening situations. The connection between hearing science and the kinesthetics of head movement has gained interest due to technological advances that have increased the feasibility of providing behavioral and biological feedback to assistive listening devices that can interpret movement patterns that reflect listening intent. Increasing evidence also shows that the negative impact of hearing deficits on mobility, gait, and balance may be mitigated by prosthetic hearing device intervention. Better understanding of the relationships between head movement, full body kinetics, and hearing health, should lead to improved signal processing strategies across a range of assistive and augmented hearing devices. The purpose of this review is to introduce the wider hearing community to the kinesiology of head movement and to place it in the context of hearing and communication with the goal of expanding the field of ecologically-specific listener behavior.
Collapse
Affiliation(s)
- Nathan C. Higgins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
| | - Daniel A. Pupo
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
- School of Aging Studies, University of South Florida, Tampa, FL, United States
| | - Erol J. Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
| |
Collapse
|
5
|
Hládek Ľ, Seeber BU. Speech Intelligibility in Reverberation is Reduced During Self-Rotation. Trends Hear 2023; 27:23312165231188619. [PMID: 37475460 PMCID: PMC10363862 DOI: 10.1177/23312165231188619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 06/23/2023] [Accepted: 07/02/2023] [Indexed: 07/22/2023] Open
Abstract
Speech intelligibility in cocktail party situations has been traditionally studied for stationary sound sources and stationary participants. Here, speech intelligibility and behavior were investigated during active self-rotation of standing participants in a spatialized speech test. We investigated if people would rotate to improve speech intelligibility, and we asked if knowing the target location would be further beneficial. Target sentences randomly appeared at one of four possible locations: 0°, ± 90°, 180° relative to the participant's initial orientation on each trial, while speech-shaped noise was presented from the front (0°). Participants responded naturally with self-rotating motion. Target sentences were presented either without (Audio-only) or with a picture of an avatar (Audio-Visual). In a baseline (Static) condition, people were standing still without visual location cues. Participants' self-orientation undershot the target location and orientations were close to acoustically optimal. Participants oriented more often in an acoustically optimal way, and speech intelligibility was higher in the Audio-Visual than in the Audio-only condition for the lateral targets. The intelligibility of the individual words in Audio-Visual and Audio-only increased during self-rotation towards the rear target, but it was reduced for the lateral targets when compared to Static, which could be mostly, but not fully, attributed to changes in spatial unmasking. Speech intelligibility prediction based on a model of static spatial unmasking considering self-rotations overestimated the participant performance by 1.4 dB. The results suggest that speech intelligibility is reduced during self-rotation, and that visual cues of location help to achieve more optimal self-rotations and better speech intelligibility.
Collapse
Affiliation(s)
- Ľuboš Hládek
- Audio Information Processing, Technical University of Munich, Munich, Germany
| | - Bernhard U. Seeber
- Audio Information Processing, Technical University of Munich, Munich, Germany
| |
Collapse
|
6
|
Prud'homme L, Lavandier M, Best V. Investigating the role of harmonic cancellation in speech-on-speech masking. Hear Res 2022; 426:108562. [PMID: 35768309 PMCID: PMC9722527 DOI: 10.1016/j.heares.2022.108562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 06/15/2022] [Indexed: 11/30/2022]
Abstract
This study investigated the role of harmonic cancellation in the intelligibility of speech in "cocktail party" situations. While there is evidence that harmonic cancellation plays a role in the segregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing non-stationary F0s and unvoiced segments is unclear. Here we focused on the energetic masking of speech targets caused by competing speech maskers. Speech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmonic complexes, monotonized speech, noise-vocoded speech, reversed speech and natural speech. These maskers enabled an estimate of how the masking potential of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. Measured speech reception thresholds were compared to the predictions of two computational models, with and without a harmonic cancellation component. Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixtures.
Collapse
Affiliation(s)
- Luna Prud'homme
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| |
Collapse
|
7
|
Prud'homme L, Lavandier M, Best V. A dynamic binaural harmonic-cancellation model to predict speech intelligibility against a harmonic masker varying in intonation, temporal envelope, and location. Hear Res 2022; 426:108535. [PMID: 35654633 PMCID: PMC9684346 DOI: 10.1016/j.heares.2022.108535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 05/23/2022] [Indexed: 11/28/2022]
Abstract
The aim of this study was to extend the harmonic-cancellation model proposed by Prud'homme et al. [J. Acoust. Soc. Am. 148 (2020) 3246--3254] to predict speech intelligibility against a harmonic masker, so that it takes into account binaural hearing, amplitude modulations in the masker and variations in masker fundamental frequency (F0) over time. This was done by segmenting the masker signal into time frames and combining the previous long-term harmonic-cancellation model with the binaural model proposed by Vicente and Lavandier [Hear. Res. 390 (2020) 107937]. The new model was tested on the data from two experiments involving harmonic complex maskers that varied in spatial location, temporal envelope and F0 contour. The interactions between the associated effects were accounted for in the model by varying the time frame duration and excluding the binaural unmasking computation when harmonic cancellation is active. Across both experiments, the correlation between data and model predictions was over 0.96, and the mean and largest absolute prediction errors were lower than 0.6 and 1.5 dB, respectively.
Collapse
Affiliation(s)
- Luna Prud'homme
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France
| | - Mathieu Lavandier
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA 02215, USA
| |
Collapse
|
8
|
Rennies J, Röttges S, Huber R, Hauth CF, Brand T. A joint framework for blind prediction of binaural speech intelligibility and perceived listening effort. Hear Res 2022; 426:108598. [PMID: 35995688 DOI: 10.1016/j.heares.2022.108598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 07/15/2022] [Accepted: 08/03/2022] [Indexed: 11/16/2022]
Abstract
Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in conditions when target speech and maskers originate from different directions. Most studies and proposed models for predicting spatial unmasking have focused on speech intelligibility. The present study introduces a model framework that predicts both speech intelligibility and perceived listening effort from the same output measure. The framework is based on a combination of a blind binaural processing stage employing a blind equalization cancelation (EC) mechanism, and a blind backend based on phoneme probability classification. Neither frontend nor backend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the number of sources, allowing for a fully blind perceptual assessment of binaural input signals consisting of target speech mixed with noise. The model is validated against a recent data set in which speech intelligibility and perceived listening effort were measured for a range of acoustic conditions differing in reverberation and binaural cues [Rennies and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159]. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC stage and a backend based on the speech intelligibility index. The analyses indicated that all main trends observed in the experiments were correctly predicted by the blind model. The overall proportion of variance explained by the model (R² = 0.94) for speech intelligibility was slightly worse than for the non-blind model (R² = 0.98). For listening effort predictions, both models showed lower prediction accuracy, but still explained significant proportions of the observed variance (R² = 0.88 and R² = 0.71 for the non-blind and blind model, respectively). Closer inspection showed that the differences between data and predictions were largest for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the models, specifically by the blind version.
Collapse
Affiliation(s)
- Jan Rennies
- Fraunhofer IDMT, Hearing, Speech and Audio Technology and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany.
| | - Saskia Röttges
- Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Rainer Huber
- Fraunhofer IDMT, Hearing, Speech and Audio Technology and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany
| | - Christopher F Hauth
- Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Thomas Brand
- Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany
| |
Collapse
|
9
|
Feng Y, Chen F. Nonintrusive objective measurement of speech intelligibility: A review of methodology. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Cuevas-Rodriguez M, Gonzalez-Toledo D, Reyes-Lecuona A, Picinali L. Impact of non-individualised head related transfer functions on speech-in-noise performances within a synthesised virtual environment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2573. [PMID: 33940900 DOI: 10.1121/10.0004220] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 03/19/2021] [Indexed: 06/12/2023]
Abstract
When performing binaural spatialisation, it is widely accepted that the choice of the head related transfer functions (HRTFs), and in particular the use of individually measured ones, can have an impact on localisation accuracy, externalization, and overall realism. Yet the impact of HRTF choices on speech-in-noise performances in cocktail party-like scenarios has not been investigated in depth. This paper introduces a study where 22 participants were presented with a frontal speech target and two lateral maskers, spatialised using a set of non-individual HRTFs. Speech reception threshold (SRT) was measured for each HRTF. Furthermore, using the SRT predicted by an existing speech perception model, the measured values were compensated in the attempt to remove overall HRTF-specific benefits. Results show significant overall differences among the SRTs measured using different HRTFs, consistently with the results predicted by the model. Individual differences between participants related to their SRT performances using different HRTFs could also be found, but their significance was reduced after the compensation. The implications of these findings are relevant to several research areas related to spatial hearing and speech perception, suggesting that when testing speech-in-noise performances within binaurally rendered virtual environments, the choice of the HRTF for each individual should be carefully considered.
Collapse
Affiliation(s)
- Maria Cuevas-Rodriguez
- Departamento de Tecnología Electrónica, Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| | - Daniel Gonzalez-Toledo
- Departamento de Tecnología Electrónica, Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| | - Arcadio Reyes-Lecuona
- Departamento de Tecnología Electrónica, Universidad de Málaga, ETSI Telecomunicación, 29010 Málaga, Spain
| | - Lorenzo Picinali
- Dyson School of Design Engineering, Imperial College London, London SW7 2DB, United Kingdom
| |
Collapse
|
11
|
Ahrens A, Cuevas-Rodriguez M, Brimijoin WO. Speech intelligibility with various head-related transfer functions: A computational modelling approach. JASA EXPRESS LETTERS 2021; 1:034401. [PMID: 36154562 DOI: 10.1121/10.0003618] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Speech intelligibility (SI) is known to be affected by the relative spatial position between target and interferers. The benefit of a spatial separation is, along with other factors, related to the head-related transfer function (HRTF). The HRTF is individually different and thus, the cues that affect SI might also be different. In the current study, an auditory model was employed to predict SI with various HRTFs and at different angles on the horizontal plane. The predicted SI threshold was found to be largely different across HRTFs. Thus, individual listeners might have different access to SI cues, dependent on their HRTF.
Collapse
Affiliation(s)
- Axel Ahrens
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | | | | |
Collapse
|
12
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
13
|
Hauth CF, Berning SC, Kollmeier B, Brand T. Modeling Binaural Unmasking of Speech Using a Blind Binaural Processing Stage. Trends Hear 2020; 24:2331216520975630. [PMID: 33305690 PMCID: PMC7734536 DOI: 10.1177/2331216520975630] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The equalization cancellation model is often used to predict the binaural masking level difference. Previously its application to speech in noise has required separate knowledge about the speech and noise signals to maximize the signal-to-noise ratio (SNR). Here, a novel, blind equalization cancellation model is introduced that can use the mixed signals. This approach does not require any assumptions about particular sound source directions. It uses different strategies for positive and negative SNRs, with the switching between the two steered by a blind decision stage utilizing modulation cues. The output of the model is a single-channel signal with enhanced SNR, which we analyzed using the speech intelligibility index to compare speech intelligibility predictions. In a first experiment, the model was tested on experimental data obtained in a scenario with spatially separated target and masker signals. Predicted speech recognition thresholds were in good agreement with measured speech recognition thresholds with a root mean square error less than 1 dB. A second experiment investigated signals at positive SNRs, which was achieved using time compressed and low-pass filtered speech. The results demonstrated that binaural unmasking of speech occurs at positive SNRs and that the modulation-based switching strategy can predict the experimental results.
Collapse
Affiliation(s)
- Christopher F Hauth
- Medizinische Physik and Cluster of Excellence Hearing4All Carl-von-Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Simon C Berning
- Medizinische Physik and Cluster of Excellence Hearing4All Carl-von-Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4All Carl-von-Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Thomas Brand
- Medizinische Physik and Cluster of Excellence Hearing4All Carl-von-Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
14
|
Vicente T, Lavandier M, Buchholz JM. A binaural model implementing an internal noise to predict the effect of hearing impairment on speech intelligibility in non-stationary noises. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3305. [PMID: 33261412 DOI: 10.1121/10.0002660] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 10/22/2020] [Indexed: 05/20/2023]
Abstract
A binaural model predicting speech intelligibility in envelope-modulated noise for normal-hearing (NH) and hearing-impaired listeners is proposed. The study shows the importance of considering an internal noise with two components relying on the individual audiogram and the level of the external stimuli. The model was optimized and verified using speech reception thresholds previously measured in three experiments involving NH and hearing-impaired listeners and sharing common methods. The anechoic target, in front of the listener, was presented simultaneously through headphones with two anechoic noise-vocoded speech maskers (VSs) either co-located with the target or spatially separated using an infinite broadband interaural level difference without crosstalk between ears. In experiment 1, two stationary noise maskers were also tested. In experiment 2, the VSs were presented at different sensation levels to vary audibility. In experiment 3, the effects of realistic interaural time and level differences were also tested. The model was applied to two datasets involving NH listeners to verify its backward compatibility. It was optimized to predict the data, leading to a correlation and mean absolute error between data and predictions above 0.93 and below 1.1 dB, respectively. The different internal noise approaches proposed in the literature to describe hearing impairment are discussed.
Collapse
Affiliation(s)
- Thibault Vicente
- Université de Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518 Vaulx-en-Velin Cedex, France
| | - Mathieu Lavandier
- Université de Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518 Vaulx-en-Velin Cedex, France
| | - Jörg M Buchholz
- Department of Linguistics-Audiology, Australian Hearing Hub, Macquarie University, 2109 New South Wales, Australia
| |
Collapse
|
15
|
Vicente T, Lavandier M. Further validation of a binaural model predicting speech intelligibility against envelope-modulated noises. Hear Res 2020; 390:107937. [PMID: 32192940 DOI: 10.1016/j.heares.2020.107937] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 02/24/2020] [Accepted: 02/28/2020] [Indexed: 10/24/2022]
Abstract
Collin and Lavandier [J. Acoust. Soc. Am. 134, 1146-1159 (2013)] proposed a binaural model predicting speech intelligibility against envelope-modulated noises, evaluated in 24 acoustic conditions, involving similar masker types. The aim of the present study was to test the model robustness modeling 80 additional conditions, and evaluate the influence of its parameters using an approach inspired by a variance-based sensitivity analysis. First, the data from four experiments from the literature and one specifically designed for the present study were used to evaluate the prediction performance of the model, investigate potential interactions between its parameters, and define their values leading to the best predictions. A revision of the model allowed to account for binaural sluggishness. Finally, the optimized model was tested on an additional dataset not used to define its parameters. Overall, one hundred conditions split into six experiments were modeled. Correlation between data and predictions ranged from 0.85 to 0.96 across experiments, and mean absolute prediction errors were between 0.5 and 1.4 dB.
Collapse
Affiliation(s)
- Thibault Vicente
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518, Vaulx-en-Velin Cedex, France.
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518, Vaulx-en-Velin Cedex, France
| |
Collapse
|
16
|
Baltzell LS, Swaminathan J, Cho AY, Lavandier M, Best V. Binaural sensitivity and release from speech-on-speech masking in listeners with and without hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1546. [PMID: 32237845 PMCID: PMC7060089 DOI: 10.1121/10.0000812] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 02/07/2020] [Accepted: 02/11/2020] [Indexed: 05/29/2023]
Abstract
Listeners with sensorineural hearing loss routinely experience less spatial release from masking (SRM) in speech mixtures than listeners with normal hearing. Hearing-impaired listeners have also been shown to have degraded temporal fine structure (TFS) sensitivity, a consequence of which is degraded access to interaural time differences (ITDs) contained in the TFS. Since these "binaural TFS" cues are critical for spatial hearing, it has been hypothesized that degraded binaural TFS sensitivity accounts for the limited SRM experienced by hearing-impaired listeners. In this study, speech stimuli were noise-vocoded using carriers that were systematically decorrelated across the left and right ears, thus simulating degraded binaural TFS sensitivity. Both (1) ITD sensitivity in quiet and (2) SRM in speech mixtures spatialized using ITDs (or binaural release from masking; BRM) were measured as a function of TFS interaural decorrelation in young normal-hearing and hearing-impaired listeners. This allowed for the examination of the relationship between ITD sensitivity and BRM over a wide range of ITD thresholds. This paper found that, for a given ITD sensitivity, hearing-impaired listeners experienced less BRM than normal-hearing listeners, suggesting that binaural TFS sensitivity can account for only a modest portion of the BRM deficit in hearing-impaired listeners. However, substantial individual variability was observed.
Collapse
Affiliation(s)
- Lucas S Baltzell
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Adrian Y Cho
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Mathieu Lavandier
- University of Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, F-69518 Vaulx-en-Velin Cedex, France
| | - Virginia Best
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
17
|
Hadley LV, Brimijoin WO, Whitmer WM. Speech, movement, and gaze behaviours during dyadic conversation in noise. Sci Rep 2019; 9:10451. [PMID: 31320658 PMCID: PMC6639257 DOI: 10.1038/s41598-019-46416-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 06/20/2019] [Indexed: 11/09/2022] Open
Abstract
How do people have conversations in noise and make themselves understood? While many previous studies have investigated speaking and listening in isolation, this study focuses on the behaviour of pairs of individuals in an ecologically valid context. Specifically, we report the fine-grained dynamics of natural conversation between interlocutors of varying hearing ability (n = 30), addressing how different levels of background noise affect speech, movement, and gaze behaviours. We found that as noise increased, people spoke louder and moved closer together, although these behaviours provided relatively small acoustic benefit (0.32 dB speech level increase per 1 dB noise increase). We also found that increased noise led to shorter utterances and increased gaze to the speaker's mouth. Surprisingly, interlocutors did not make use of potentially beneficial head orientations. While participants were able to sustain conversation in noise of up to 72 dB, changes in conversation structure suggested increased difficulty at 78 dB, with a significant decrease in turn-taking success. Understanding these natural conversation behaviours could inform broader models of interpersonal communication, and be applied to the development of new communication technologies. Furthermore, comparing these findings with those from isolation paradigms demonstrates the importance of investigating social processes in ecologically valid multi-person situations.
Collapse
Affiliation(s)
- Lauren V Hadley
- Hearing Sciences - Scottish Section, Division of Clinical Neuroscience, University of Nottingham, Glasgow, UK.
| | - W Owen Brimijoin
- Hearing Sciences - Scottish Section, Division of Clinical Neuroscience, University of Nottingham, Glasgow, UK
| | - William M Whitmer
- Hearing Sciences - Scottish Section, Division of Clinical Neuroscience, University of Nottingham, Glasgow, UK
| |
Collapse
|
18
|
Ahrens A, Marschall M, Dau T. Measuring and modeling speech intelligibility in real and loudspeaker-based virtual sound environments. Hear Res 2019; 377:307-317. [DOI: 10.1016/j.heares.2019.02.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/06/2019] [Accepted: 02/12/2019] [Indexed: 11/26/2022]
|
19
|
Kokabi O, Brinkmann F, Weinzierl S. Prediction of speech intelligibility using pseudo-binaural room impulse responses. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL329. [PMID: 31046300 DOI: 10.1121/1.5099169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Head orientation (HO) affects better-ear-listening and spatial-release-from-masking, which are two key aspects in binaural speech intelligibility. To incorporate HO in speech intelligibility prediction, binaural room impulse responses (BRIRs) for every HO of interest could be used. Due to the limited spectral bandwidth of speech, however, approximate representations might be sufficient, which can be measured more quickly. A comparison was done between pseudo-BRIRs generated with a motion tracked binaural microphone array and a first order Ambisonics microphone using the spatial decomposition method (SDM). The accuracy of the Ambisonics/SDM approach was comparable to that of real BRIRs, indicating its suitability for speech intelligibility prediction.
Collapse
Affiliation(s)
- Omid Kokabi
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, 10587 Berlin, , ,
| | - Fabian Brinkmann
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, 10587 Berlin, , ,
| | - Stefan Weinzierl
- Audio Communication Group, Technical University of Berlin, Einsteinufer 17c, 10587 Berlin, , ,
| |
Collapse
|
20
|
Grange JA, Culling JF, Bardsley B, Mackinney LI, Hughes SE, Backhouse SS. Turn an Ear to Hear: How Hearing-Impaired Listeners Can Exploit Head Orientation to Enhance Their Speech Intelligibility in Noisy Social Settings. Trends Hear 2019; 22:2331216518802701. [PMID: 30334495 PMCID: PMC6196611 DOI: 10.1177/2331216518802701] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Turning an ear toward the talker can enhance spatial release from masking. Here, with their head free, listeners attended to speech at a gradually diminishing signal-to-noise ratio and with the noise source azimuthally separated from the speech source by 180° or 90°. Young normal-hearing adult listeners spontaneously turned an ear toward the speech source in 64% of audio-only trials, but a visible talker’s face or cochlear implant (CI) use significantly reduced this head-turn behavior. All listener groups made more head movements once instructed to explore the potential benefit of head turns and followed the speech to lower signal-to-noise ratios. Unilateral CI users improved the most. In a virtual restaurant simulation with nine interfering noises or voices, hearing-impaired listeners and simulated bilateral CI users typically obtained a 1 to 3 dB head-orientation benefit from a 30° head turn away from the talker. In diffuse interference environments, the advice to U.K. CI users from many CI professionals and the communication guidance available on the Internet most often advise the CI user to face the talker head on. However, CI users would benefit from guidelines that recommend they look sidelong at the talker with their better hearing or implanted ear oriented toward the talker.
Collapse
Affiliation(s)
- Jacques A. Grange
- School of Psychology, Cardiff University, UK
- Jacques A. Grange, School of Psychology, Cardiff University, 70 Parc Place, Cardiff CF103AT, UK.
| | | | | | | | - Sarah E. Hughes
- South Wales Cochlear Implant Programme, Princess of Wales Hospital, Bridgend, UK
| | - Steven S. Backhouse
- South Wales Cochlear Implant Programme, Princess of Wales Hospital, Bridgend, UK
| |
Collapse
|
21
|
Kokabi O, Brinkmann F, Weinzierl S. Segmentation of binaural room impulse responses for speech intelligibility prediction. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2793. [PMID: 30522312 DOI: 10.1121/1.5078598] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 10/24/2018] [Indexed: 06/09/2023]
Abstract
The two most important aspects in binaural speech perception-better-ear-listening and spatial-release-from-masking-can be predicted well with current binaural modeling frameworks operating on head-related impulse responses, i.e., anechoic binaural signals. To incorporate effects of reverberation, a model extension was proposed, splitting binaural room impulse responses into an early, useful, and late, detrimental part, before being fed into the modeling framework. More recently, an interaction between the applied splitting time, room properties, and the resulting prediction accuracy was observed. This interaction was investigated here by measuring speech reception thresholds (SRTs) in quiet with 18 normal-hearing subjects for four simulated rooms with different reverberation times and a constant room geometry. The mean error with one of the most promising binaural prediction models could be reduced by about 1 dB by adapting the applied splitting time to room acoustic parameters. This improvement in prediction accuracy can make up a difference of 17% in absolute intelligibility within the applied SRT measurement paradigm.
Collapse
Affiliation(s)
- Omid Kokabi
- TU Berlin, Audio Communication Group, Einsteinufer 17c, 10587 Berlin, Germany
| | - Fabian Brinkmann
- TU Berlin, Audio Communication Group, Einsteinufer 17c, 10587 Berlin, Germany
| | - Stefan Weinzierl
- TU Berlin, Audio Communication Group, Einsteinufer 17c, 10587 Berlin, Germany
| |
Collapse
|
22
|
Cubick J, Buchholz JM, Best V, Lavandier M, Dau T. Listening through hearing aids affects spatial perception and speech intelligibility in normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2896. [PMID: 30522291 PMCID: PMC6246072 DOI: 10.1121/1.5078582] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Cubick and Dau [(2016). Acta Acust. Acust. 102, 547-557] showed that speech reception thresholds (SRTs) in noise, obtained with normal-hearing listeners, were significantly higher with hearing aids (HAs) than without. Some listeners reported a change in their spatial perception of the stimuli due to the HA processing, with auditory images often being broader and closer to the head or even internalized. The current study investigated whether worse speech intelligibility with HAs might be explained by distorted spatial perception and the resulting reduced ability to spatially segregate the target speech from the interferers. SRTs were measured in normal-hearing listeners with or without HAs in the presence of three interfering talkers or speech-shaped noises. Furthermore, listeners were asked to sketch their spatial perception of the acoustic scene. Consistent with the previous study, SRTs increased with HAs. Spatial release from masking was lower with HAs than without. The effects were similar for noise and speech maskers and appeared to be accounted for by changes to energetic masking. This interpretation was supported by results from a binaural speech intelligibility model. Even though the sketches indicated a change of spatial perception with HAs, no direct link between spatial perception and segregation of talkers could be shown.
Collapse
Affiliation(s)
- Jens Cubick
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kongens Lyngby, Denmark
| | - Jörg M Buchholz
- Department of Linguistics, Australian Hearing Hub, 16 University Avenue, Macquarie University, New South Wales 2109, Australia
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, F-69518 Vaulx-en-Velin, France
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
23
|
Shen Y, Folkerts ML, Richards VM. Head movements while recognizing speech arriving from behind. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:EL108. [PMID: 28253640 PMCID: PMC5724624 DOI: 10.1121/1.4976111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 12/12/2016] [Accepted: 12/15/2016] [Indexed: 06/06/2023]
Abstract
Listeners' head movements were measured during speech recognition with simultaneous maskers. Both the target and masker were behind the listener, separated by 30°. Frequent head turns during speech recognition were observed for four of the ten listeners. For those four, head turns were more frequent at lower target-to-masker ratios (TMRs) and were oriented toward the target speech source. When the masker was competing speech the ultimate head orientation angle was larger at lower TMRs. These observed head movements are not consistent with a strategy that maximizes either the target level at a single ear or binaural unmasking of speech.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405-7000, USA
| | - Monica L Folkerts
- Department of Cognitive Sciences, University of California, Irvine, 3151 Social Science Plaza, Irvine, California 92687-5100, USA ,
| | - Virgina M Richards
- Department of Cognitive Sciences, University of California, Irvine, 3151 Social Science Plaza, Irvine, California 92687-5100, USA ,
| |
Collapse
|
24
|
Culling JF, Stone MA. Energetic Masking and Masking Release. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_3] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
25
|
Grange JA, Culling JF. Head orientation benefit to speech intelligibility in noise for cochlear implant users and in realistic listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:4061. [PMID: 28039996 DOI: 10.1121/1.4968515] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Cochlear implant (CI) users suffer from elevated speech-reception thresholds and may rely on lip reading. Traditional measures of spatial release from masking quantify speech-reception-threshold improvement with azimuthal separation of target speaker and interferers and with the listener facing the target speaker. Substantial benefits of orienting the head away from the target speaker were predicted by a model of spatial release from masking. Audio-only and audio-visual speech-reception thresholds in normal-hearing (NH) listeners and bilateral and unilateral CI users confirmed model predictions of this head-orientation benefit. The benefit ranged 2-5 dB for a modest 30° orientation that did not affect the lip-reading benefit. NH listeners' and CI users' lip-reading benefit measured 3 and 5 dB, respectively. A head-orientation benefit of ∼2 dB was also both predicted and observed in NH listeners in realistic simulations of a restaurant listening environment. Exploiting the benefit of head orientation is thus a robust hearing tactic that would benefit both NH listeners and CI users in noisy listening conditions.
Collapse
Affiliation(s)
- Jacques A Grange
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF103AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF103AT, United Kingdom
| |
Collapse
|
26
|
Tang Y, Cooke M, Fazenda BM, Cox TJ. A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1858. [PMID: 27914424 DOI: 10.1121/1.4962484] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
One criterion in the design of binaural sound scenes in audio production is the extent to which the intended speech message is correctly understood. Object-based audio broadcasting systems have permitted sound editors to gain more access to the metadata (e.g., intensity and location) of each sound source, providing better control over speech intelligibility. The current study describes and evaluates a binaural distortion-weighted glimpse proportion metric-BiDWGP-which is motivated by better-ear glimpsing and binaural masking level differences. BiDWGP predicts intelligibility from two alternative input forms: either binaural recordings or monophonic recordings from each sound source along with their locations. Two listening experiments were performed with stationary noise and competing speech, one in the presence of a single masker, the other with multiple maskers, for a variety of spatial configurations. Overall, BiDWGP with both input forms predicts listener keyword scores with correlations of 0.95 and 0.91 for single- and multi-masker conditions, respectively. When considering masker type separately, correlations rise to 0.95 and above for both types of maskers. Predictions using the two input forms are very similar, suggesting that BiDWGP can be applied to the design of sound scenes where only individual sound sources and their locations are available.
Collapse
Affiliation(s)
- Yan Tang
- Acoustics Research Centre, University of Salford, Salford M5 4WT, United Kingdom
| | - Martin Cooke
- Ikerbasque (Basque Science Foundation), Bilbao, Spain
| | - Bruno M Fazenda
- Acoustics Research Centre, University of Salford, Salford M5 4WT, United Kingdom
| | - Trevor J Cox
- Acoustics Research Centre, University of Salford, Salford M5 4WT, United Kingdom
| |
Collapse
|
27
|
Chabot-Leclerc A, MacDonald EN, Dau T. Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:192. [PMID: 27475146 DOI: 10.1121/1.4954254] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
This study proposes a binaural extension to the multi-resolution speech-based envelope power spectrum model (mr-sEPSM) [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. It consists of a combination of better-ear (BE) and binaural unmasking processes, implemented as two monaural realizations of the mr-sEPSM combined with a short-term equalization-cancellation process, and uses the signal-to-noise ratio in the envelope domain (SNRenv) as the decision metric. The model requires only two parameters to be fitted per speech material and does not require an explicit frequency weighting. The model was validated against three data sets from the literature, which covered the following effects: the number of maskers, the masker types [speech-shaped noise (SSN), speech-modulated SSN, babble, and reversed speech], the masker(s) azimuths, reverberation on the target and masker, and the interaural time difference of the target and masker. The Pearson correlation coefficient between the simulated speech reception thresholds and the data across all experiments was 0.91. A model version that considered only BE processing performed similarly (correlation coefficient of 0.86) to the complete model, suggesting that BE processing could be considered sufficient to predict intelligibility in most realistic conditions.
Collapse
Affiliation(s)
- Alexandre Chabot-Leclerc
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800, Kongens Lyngby, Denmark
| | - Ewen N MacDonald
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800, Kongens Lyngby, Denmark
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800, Kongens Lyngby, Denmark
| |
Collapse
|
28
|
Harder S, Paulsen RR, Larsen M, Laugesen S, Mihocic M, Majdak P. A framework for geometry acquisition, 3-D printing, simulation, and measurement of head-related transfer functions with a focus on hearing-assistive devices. COMPUTER AIDED DESIGN 2016; 75-76:39-46. [PMID: 28239188 PMCID: PMC5321480 DOI: 10.1016/j.cad.2016.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Individual head-related transfer functions (HRTFs) are essential in applications like fitting hearing-assistive devices (HADs) for providing accurate sound localization performance. Individual HRTFs are usually obtained through intricate acoustic measurements. This paper investigates the use of a three-dimensional (3D) head model for acquisition of individual HRTFs. Two aspects were investigated; whether a 3D-printed model can replace measurements on a human listener and whether numerical simulations can replace acoustic measurements. For this purpose, HRTFs were acoustically measured for four human listeners and for a 3D printed head model of one of these listeners. Further, HRTFs were simulated by applying the finite element method to the 3D head model. The monaural spectral features and spectral distortions were very similar between re-measurements and between human and printed measurements, however larger deviations were observed between measurement and simulation. The binaural cues were in agreement among all HRTFs of the same listener, indicating that the 3D model is able to provide localization cues potentially accessible to HAD users. Hence, the pipeline of geometry acquisition, printing, and acoustic measurements or simulations, seems to be a promising step forward towards in-silico design of HADs.
Collapse
Affiliation(s)
- Stine Harder
- Technical University of Denmark, DTU Compute, DK-2800 Lyngby, Denmark
| | - Rasmus R. Paulsen
- Technical University of Denmark, DTU Compute, DK-2800 Lyngby, Denmark
| | | | | | - Michael Mihocic
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| | - Piotr Majdak
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| |
Collapse
|
29
|
Grange JA, Culling JF. The benefit of head orientation to speech intelligibility in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:703-712. [PMID: 26936554 DOI: 10.1121/1.4941655] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Spatial release from masking is traditionally measured with speech in front. The effect of head-orientation with respect to the speech direction has rarely been studied. Speech-reception thresholds (SRTs) were measured for eight head orientations and four spatial configurations. Benefits of head orientation away from the speech source of up to 8 dB were measured. These correlated with predictions of a model based on better-ear listening and binaural unmasking (r = 0.96). Use of spontaneous head orientations was measured when listeners attended to long speech clips of gradually diminishing speech-to-noise ratio in a sound-deadened room. Speech was presented from the loudspeaker that initially faced the listener and noise from one of four other locations. In an undirected paradigm, listeners spontaneously turned their heads away from the speech in 56% of trials. When instructed to rotate their heads in the diminishing speech-to-noise ratio, all listeners turned away from the speech and reached head orientations associated with lower SRTs. Head orientation may prove valuable for hearing-impaired listeners.
Collapse
Affiliation(s)
- Jacques A Grange
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF103AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF103AT, United Kingdom
| |
Collapse
|
30
|
Geravanchizadeh M, Fallah A. Microscopic prediction of speech intelligibility in spatially distributed speech-shaped noise for normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:4004-4015. [PMID: 26723354 DOI: 10.1121/1.4938230] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A binaural and psychoacoustically motivated intelligibility model, based on a well-known monaural microscopic model is proposed. This model simulates a phoneme recognition task in the presence of spatially distributed speech-shaped noise in anechoic scenarios. In the proposed model, binaural advantage effects are considered by generating a feature vector for a dynamic-time-warping speech recognizer. This vector consists of three subvectors incorporating two monaural subvectors to model the better-ear hearing, and a binaural subvector to simulate the binaural unmasking effect. The binaural unit of the model is based on equalization-cancellation theory. This model operates blindly, which means separate recordings of speech and noise are not required for the predictions. Speech intelligibility tests were conducted with 12 normal hearing listeners by collecting speech reception thresholds (SRTs) in the presence of single and multiple sources of speech-shaped noise. The comparison of the model predictions with the measured binaural SRTs, and with the predictions of a macroscopic binaural model called extended equalization-cancellation, shows that this approach predicts the intelligibility in anechoic scenarios with good precision. The square of the correlation coefficient (r(2)) and the mean-absolute error between the model predictions and the measurements are 0.98 and 0.62 dB, respectively.
Collapse
Affiliation(s)
- Masoud Geravanchizadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 5166615813, Iran
| | - Ali Fallah
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 5166615813, Iran
| |
Collapse
|
31
|
Leclère T, Lavandier M, Culling JF. Speech intelligibility prediction in reverberation: Towards an integrated model of speech transmission, spatial unmasking, and binaural de-reverberation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:3335-3345. [PMID: 26093423 DOI: 10.1121/1.4921028] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Room acoustic indicators of intelligibility have focused on the effects of temporal smearing of speech by reverberation and masking by diffuse ambient noise. In the presence of a discrete noise source, these indicators neglect the binaural listener's ability to separate target speech from noise. Lavandier and Culling [(2010). J. Acoust. Soc. Am. 127, 387-399] proposed a model that incorporates this ability but neglects the temporal smearing of speech, so that predictions hold for near-field targets. An extended model based on useful-to-detrimental (U/D) ratios is presented here that accounts for temporal smearing, spatial unmasking, and binaural de-reverberation in reverberant environments. The influence of the model parameters was tested by comparing the model predictions with speech reception thresholds measured in three experiments from the literature. Accurate predictions were obtained by adjusting the parameters to each room. Room-independent parameters did not lead to similar performances, suggesting that a single U/D model cannot be generalized to any room. Despite this limitation, the model framework allows to propose a unified interpretation of spatial unmasking, temporal smearing, and binaural de-reverberation.
Collapse
Affiliation(s)
- Thibaud Leclère
- Université de Lyon, École Nationale des Travaux Publics de l'État, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, 69518 Vaulx-en-Velin Cedex, France
| | - Mathieu Lavandier
- Université de Lyon, École Nationale des Travaux Publics de l'État, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, 69518 Vaulx-en-Velin Cedex, France
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| |
Collapse
|
32
|
Furness DN. Abstracts of the Fourth Joint Annual Conference, Experimental and Clinical Short Papers Meetings of the British Society of Audiology. Int J Audiol 2014. [DOI: 10.3109/14992027.2014.938194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
33
|
Cosentino S, Marquardt T, McAlpine D, Culling JF, Falk TH. A model that predicts the binaural advantage to speech intelligibility from the mixed target and interferer signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:796-807. [PMID: 25234888 DOI: 10.1121/1.4861239] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A model is presented that predicts the binaural advantage to speech intelligibility by analyzing the right and left recordings at the two ears containing mixed target and interferer signals. This auditory-inspired model implements an equalization-cancellation stage to predict the binaural unmasking (BU) component, in conjunction with a modulation-frequency estimation block to estimate the "better ear" effect (BE) component of the binaural advantage. The model's performance was compared to experimental data obtained under anechoic and reverberant conditions using a single speech-shaped noise interferer paradigm. The internal BU and BE components were compared to those of the speech intelligibility model recently proposed by Lavandier et al. [J. Acoust. Soc. Am. 131, 218-231 (2012)], which requires separate inputs for target and interferer. The data indicate that the proposed model provides comparably good predictions from a mixed-signals input under both anechoic and reverberant conditions.
Collapse
Affiliation(s)
| | | | - David McAlpine
- Ear Institute, University College London, London, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Cardiff, United Kingdom
| | - Tiago H Falk
- Institut National de la Recherche Scientifique-Eńergie Matériaux Télécommunications, University of Québec, Montreal, Canada
| |
Collapse
|
34
|
Collin B, Lavandier M. Binaural speech intelligibility in rooms with variations in spatial location of sources and modulation depth of noise interferers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:1146-59. [PMID: 23927114 DOI: 10.1121/1.4812248] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Four experiments investigated the effects on speech intelligibility of reverberation, sound source locations, and amplitude modulation of the interferers. Speech reception thresholds (SRTs) were measured using headphones and stimuli that simulated real-room listening, considering one or two interferers which were stationary or speech-modulated noises. In experiment 1, SRTs for modulated noises showed little variation with increasing interferer reverberation. Reverberation might have increased masking by filling in the modulated noise gaps, but simultaneously changed the noise spectra making them less effective maskers. In experiment 2, SRTs were lower when measured using a unique one-voice modulated interferer rather than a different interferer for each target sentence, suggesting that listeners could take advantage of the predictability of the interferer gaps. In experiment 3, increasing speech reverberation did not significantly affect the difference of SRTs measured with stationary and modulated noises, indicating that the ability to exploit noise modulations was still useful for temporally smeared speech. In experiment 4, spatial unmasking remained constant when applying modulations to the interferers, suggesting an independence of the abilities to exploit these modulations and the spatial separation of sources. Finally, a model predicting binaural intelligibility for modulated noises was developed and provided a good fit to the experimental data.
Collapse
Affiliation(s)
- Benjamin Collin
- Université de Lyon, Ecole Nationale des Travaux Publics de l'Etat, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, 69518 Vaulx-en-Velin Cedex, France
| | | |
Collapse
|
35
|
Abstracts of the British Society of Audiology annual conference (incorporating the Experimental and Clinical Short papers meetings). Int J Audiol 2013. [DOI: 10.3109/14992027.2013.765042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
36
|
Culling JF, Mansell ER. Speech intelligibility among modulated and spatially distributed noise sources. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2254-2261. [PMID: 23556593 DOI: 10.1121/1.4794384] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
At a cocktail party, listeners are faced with multiple, spatially distributed interfering voices. The dominant interfering voice may change from moment to moment and, consequently, change in spatial location. The ability of the binaural system to deal with such a dynamic scene has not been systematically analyzed. Spatial release from masking (SRM) was measured in simple spatial scenes, simulated over headphones with a frontal speech source. For a single noise at 105°, SRM was reduced if that noise modulated (10 Hz square wave, 50% duty cycle, 20 dB modulation depth), but, for two noises in symmetrical locations, SRM increased if the noises were modulated in alternation, suggesting that the binaural system can "switch" between exploiting different spatial configurations. Experiment 2 assessed the contributions of interaural time and level differences as a function of modulation rate (1-20 Hz). Scenes were created using the original head-related impulse responses and ones that had been manipulated to isolate each cue. SRM decreased steeply with modulation rate. The combined effects of interaural time and level differences were consistent with additive contributions. The results indicate that binaural sluggishness limits the contribution of binaural switching to speech understanding at a cocktail party.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom.
| | | |
Collapse
|
37
|
Energetic and Informational Masking in a Simulated Restaurant Environment. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 787:511-8. [DOI: 10.1007/978-1-4614-1590-9_56] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
38
|
George ELJ, Festen JM, Goverts ST. Effects of reverberation and masker fluctuations on binaural unmasking of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1581-1591. [PMID: 22978887 DOI: 10.1121/1.4740500] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In daily life, listeners use two ears to understand speech in situations which typically include reverberation and non-stationary noise. In headphone experiments, the binaural benefit for speech in noise is often expressed as the difference in speech reception threshold between diotic (N(0)S(0)) and dichotic (N(0)S(π)) conditions. This binaural advantage (BA), arising from the use of inter-aural phase differences, is about 5-6 dB in stationary noise, but may be lower in everyday conditions. In the current study, BA was measured in various combinations of noise and artificially created diotic reverberation, for normal-hearing and hearing-impaired listeners. Speech-intelligibility models were applied to quantify the combined effects. Results showed that in stationary noise, diotic reverberation did not affect BA. BA was reduced in conditions where the masker fluctuated. With additional reverberation, however, it was restored. Results for both normal-hearing and hearing-impaired listeners were accounted for by assuming that binaural unmasking is only effectively realized at low instantaneous speech-to-noise ratios (SNRs). The observed BA was related to the distribution of SNRs resulting from fluctuations, reverberation, and peripheral processing. It appears that masker fluctuations and reverberation, both relevant for everyday communication, interact in their effects on binaural unmasking and need to be considered together.
Collapse
Affiliation(s)
- Erwin L J George
- VU University Medical Center, ENT/Audiology EMGO+ Institute for Health and Care Research, P.O. Box 7057, 1007 MB Amsterdam, The Netherlands
| | | | | |
Collapse
|
39
|
Abstracts of the British Society of Audiology annual conference (incorporating the Experimental and Clinical Short papers meetings). Int J Audiol 2012. [DOI: 10.3109/14992027.2012.653103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
40
|
Lavandier M, Jelfs S, Culling JF, Watkins AJ, Raimond AP, Makin SJ. Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:218-31. [PMID: 22280586 DOI: 10.1121/1.3662075] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.
Collapse
Affiliation(s)
- Mathieu Lavandier
- Département Génie Civil et Bâtiment, Université de Lyon, Ecole Nationale des Travaux Publics de l’Etat, Rue M Audin, 69518 Vaulx-en-Velin Cedex, France.
| | | | | | | | | | | |
Collapse
|
41
|
Jones GL, Litovsky RY. A cocktail party model of spatial release from masking by both noise and speech interferers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:1463-74. [PMID: 21895087 PMCID: PMC3188967 DOI: 10.1121/1.3613928] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2009] [Revised: 06/27/2011] [Accepted: 06/28/2011] [Indexed: 05/25/2023]
Abstract
A mathematical formula for estimating spatial release from masking (SRM) in a cocktail party environment would be useful as a simpler alternative to computationally intensive algorithms and may enhance understanding of underlying mechanisms. The experiment presented herein was designed to provide a strong test of a model that divides SRM into contributions of asymmetry and angular separation [Bronkhorst (2000). Acustica 86, 117-128] and to examine whether that model can be extended to include speech maskers. Across masker types the contribution to SRM of angular separation of maskers from the target was found to grow at a diminishing rate as angular separation increased within the frontal hemifield, contrary to predictions of the model. Speech maskers differed from noise maskers in the overall magnitude of SRM and in the contribution of angular separation (both greater for speech). These results were used to develop a modified model that achieved good fits to data for noise maskers (ρ=0.93) and for speech maskers (ρ=0.94) while using the same functions to describe separation and asymmetry components of SRM for both masker types. These findings suggest that this approach can be used to accurately model SRM for speech maskers in addition to primarily "energetic" noise maskers.
Collapse
Affiliation(s)
- Gary L Jones
- Waisman Center, University of Wisconsin, 1500 Highland Avenue, Madison, Wisconsin 53705, USA.
| | | |
Collapse
|