1
|
Hu H, Ewert SD, Kollmeier B, Vickers D. Rate dependent neural responses of interaural-time-difference cues in fine-structure and envelope. PeerJ 2024; 12:e17104. [PMID: 38680894 PMCID: PMC11055513 DOI: 10.7717/peerj.17104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/22/2024] [Indexed: 05/01/2024] Open
Abstract
Advancements in cochlear implants (CIs) have led to a significant increase in bilateral CI users, especially among children. Yet, most bilateral CI users do not fully achieve the intended binaural benefit due to potential limitations in signal processing and/or surgical implant positioning. One crucial auditory cue that normal hearing (NH) listeners can benefit from is the interaural time difference (ITD), i.e., the time difference between the arrival of a sound at two ears. The ITD sensitivity is thought to be heavily relying on the effective utilization of temporal fine structure (very rapid oscillations in sound). Unfortunately, most current CIs do not transmit such true fine structure. Nevertheless, bilateral CI users have demonstrated sensitivity to ITD cues delivered through envelope or interaural pulse time differences, i.e., the time gap between the pulses delivered to the two implants. However, their ITD sensitivity is significantly poorer compared to NH individuals, and it further degrades at higher CI stimulation rates, especially when the rate exceeds 300 pulse per second. The overall purpose of this research thread is to improve spatial hearing abilities in bilateral CI users. This study aims to develop electroencephalography (EEG) paradigms that can be used with clinical settings to assess and optimize the delivery of ITD cues, which are crucial for spatial hearing in everyday life. The research objective of this article was to determine the effect of CI stimulation pulse rate on the ITD sensitivity, and to characterize the rate-dependent degradation in ITD perception using EEG measures. To develop protocols for bilateral CI studies, EEG responses were obtained from NH listeners using sinusoidal-amplitude-modulated (SAM) tones and filtered clicks with changes in either fine structure ITD (ITDFS) or envelope ITD (ITDENV). Multiple EEG responses were analyzed, which included the subcortical auditory steady-state responses (ASSRs) and cortical auditory evoked potentials (CAEPs) elicited by stimuli onset, offset, and changes. Results indicated that acoustic change complex (ACC) responses elicited by ITDENV changes were significantly smaller or absent compared to those elicited by ITDFS changes. The ACC morphologies evoked by ITDFS changes were similar to onset and offset CAEPs, although the peak latencies were longest for ACC responses and shortest for offset CAEPs. The high-frequency stimuli clearly elicited subcortical ASSRs, but smaller than those evoked by lower carrier frequency SAM tones. The 40-Hz ASSRs decreased with increasing carrier frequencies. Filtered clicks elicited larger ASSRs compared to high-frequency SAM tones, with the order being 40 > 160 > 80> 320 Hz ASSR for both stimulus types. Wavelet analysis revealed a clear interaction between detectable transient CAEPs and 40-Hz ASSRs in the time-frequency domain for SAM tones with a low carrier frequency.
Collapse
Affiliation(s)
- Hongmei Hu
- SOUND Lab, Cambridge Hearing Group, Department of Clinical Neuroscience, Cambridge University, Cambridge, United Kingdom
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Stephan D. Ewert
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Birger Kollmeier
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Deborah Vickers
- SOUND Lab, Cambridge Hearing Group, Department of Clinical Neuroscience, Cambridge University, Cambridge, United Kingdom
| |
Collapse
|
2
|
Hu H, Hartog L, Kollmeier B, Ewert SD. Corrigendum: Spectral and binaural loudness summation of equally loud narrowband signals in single-sided-deafness and bilateral cochlear implant users. Front Neurosci 2023; 16:1087671. [PMID: 36711130 PMCID: PMC9880770 DOI: 10.3389/fnins.2022.1087671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 12/15/2022] [Indexed: 01/14/2023] Open
Abstract
[This corrects the article DOI: 10.3389/fnins.2022.931748.].
Collapse
Affiliation(s)
- Hongmei Hu
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany,*Correspondence: Hongmei Hu ✉
| | - Laura Hartog
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany,Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany,Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
| | - Stephan D. Ewert
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
3
|
Steffens H, Schutte M, Ewert SD. Auditory orientation and distance estimation of sighted humans using virtual echolocation with artificial and self-generated sounds. JASA Express Lett 2022; 2:124403. [PMID: 36586958 DOI: 10.1121/10.0016403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Active echolocation of sighted humans using predefined synthetic and self-emitted sounds, as habitually used by blind individuals, was investigated. Using virtual acoustics, distance estimation and directional localization of a wall in different rooms were assessed. A virtual source was attached to either the head or hand with realistic or increased source directivity. A control condition was tested with a virtual sound source located at the wall. Untrained echolocation performance comparable to performance in the control condition was achieved on an individual level. On average, the echolocation performance was considerably lower than in the control condition, however, it benefitted from increased directivity.
Collapse
Affiliation(s)
- Henning Steffens
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, 26111, Germany , ,
| | - Michael Schutte
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, 26111, Germany , ,
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, 26111, Germany , ,
| |
Collapse
|
4
|
Biberger T, Ewert SD. Binaural detection thresholds and audio quality of speech and music signals in complex acoustic environments. Front Psychol 2022; 13:994047. [PMID: 36507051 PMCID: PMC9729260 DOI: 10.3389/fpsyg.2022.994047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/26/2022] [Indexed: 11/25/2022] Open
Abstract
Every-day acoustical environments are often complex, typically comprising one attended target sound in the presence of interfering sounds (e.g., disturbing conversations) and reverberation. Here we assessed binaural detection thresholds and (supra-threshold) binaural audio quality ratings of four distortions types: spectral ripples, non-linear saturation, intensity and spatial modifications applied to speech, guitar, and noise targets in such complex acoustic environments (CAEs). The target and (up to) two masker sounds were either co-located as if contained in a common audio stream, or were spatially separated as if originating from different sound sources. The amount of reverberation was systematically varied. Masker and reverberation had a significant effect on the distortion-detection thresholds of speech signals. Quality ratings were affected by reverberation, whereas the effect of maskers depended on the distortion. The results suggest that detection thresholds and quality ratings for distorted speech in anechoic conditions are also valid for rooms with mild reverberation, but not for moderate reverberation. Furthermore, for spectral ripples, a significant relationship between the listeners' individual detection thresholds and quality ratings was found. The current results provide baseline data for detection thresholds and audio quality ratings of different distortions of a target sound in CAEs, supporting the future development of binaural auditory models.
Collapse
Affiliation(s)
- Thomas Biberger
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | | |
Collapse
|
5
|
Steffens H, Schutte M, Ewert SD. Acoustically driven orientation and navigation in enclosed spaces. J Acoust Soc Am 2022; 152:1767. [PMID: 36182293 DOI: 10.1121/10.0013702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 08/02/2022] [Indexed: 06/16/2023]
Abstract
Awareness of space, and subsequent orientation and navigation in rooms, is dominated by the visual system. However, humans are able to extract auditory information about their surroundings from early reflections and reverberation in enclosed spaces. To better understand orientation and navigation based on acoustic cues only, three virtual corridor layouts (I-, U-, and Z-shaped) were presented using real-time virtual acoustics in a three-dimensional 86-channel loudspeaker array. Participants were seated on a rotating chair in the center of the loudspeaker array and navigated using real rotation and virtual locomotion by "teleporting" in steps on a grid in the invisible environment. A head mounted display showed control elements and the environment in a visual reference condition. Acoustical information about the environment originated from a virtual sound source at the collision point of a virtual ray with the boundaries. In different control modes, the ray was cast either in view or hand direction or in a rotating, "radar"-like fashion in 90° steps to all sides. Time to complete, number of collisions, and movement patterns were evaluated. Navigation and orientation were possible based on the direct sound with little effect of room acoustics and control mode. Underlying acoustic cues were analyzed using an auditory model.
Collapse
Affiliation(s)
- Henning Steffens
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Michael Schutte
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
6
|
Ewert SD. A filter representation of diffraction at infinite and finite wedges. JASA Express Lett 2022; 2:092401. [PMID: 36182340 DOI: 10.1121/10.0013686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Diffraction of sound occurs at sound barriers, building and room corners in urban and indoor environments. Here, a unified parametric filter representation of the singly diffracted field at arbitrary wedges is suggested, connecting existing asymptotic and exact solutions in the framework of geometrical acoustics. Depending on the underlying asymptotic (high-frequency) solution, a combination of up to four half-order lowpass filters represents the diffracted field. Compact transfer function and impulse response expressions are proposed, providing errors below ±0.1 dB. To approximate the exact solution, a further asymptotic lowpass filter valid at low frequencies is suggested and combined with the high-frequency filter.
Collapse
Affiliation(s)
- Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
7
|
Hu H, Hartog L, Kollmeier B, Ewert SD. Spectral and binaural loudness summation of equally loud narrowband signals in single-sided-deafness and bilateral cochlear implant users. Front Neurosci 2022; 16:931748. [PMID: 36071716 PMCID: PMC9444060 DOI: 10.3389/fnins.2022.931748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/11/2022] [Indexed: 01/31/2023] Open
Abstract
Recent studies on loudness perception of binaural broadband signals in hearing impaired listeners found large individual differences, suggesting the use of such signals in hearing aid fitting. Likewise, clinical cochlear implant (CI) fitting with narrowband/single-electrode signals might cause suboptimal loudness perception in bilateral and bimodal CI listeners. Here spectral and binaural loudness summation in normal hearing (NH) listeners, bilateral CI (biCI) users, and unilateral CI (uCI) users with normal hearing in the unaided ear was investigated to assess the relevance of binaural/bilateral fitting in CI users. To compare the three groups, categorical loudness scaling was performed for an equal categorical loudness noise (ECLN) consisting of the sum of six spectrally separated third-octave noises at equal loudness. The acoustical ECLN procedure was adapted to an equivalent procedure in the electrical domain using direct stimulation. To ensure the same broadband loudness in binaural measurements with simultaneous electrical and acoustical stimulation, a modified binaural ECLN was introduced and cross validated with self-adjusted loudness in a loudness balancing experiment. Results showed a higher (spectral) loudness summation of the six equally loud narrowband signals in the ECLN in CI compared to NH. Binaural loudness summation was found for all three listener groups (NH, uCI, and biCI). No increased binaural loudness summation could be found for the current uCI and biCI listeners compared to the NH group. In uCI loudness balancing between narrowband signals and single electrodes did not automatically result in a balanced loudness perception across ears, emphasizing the importance of binaural/bilateral fitting.
Collapse
Affiliation(s)
- Hongmei Hu
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany,*Correspondence: Hongmei Hu,
| | - Laura Hartog
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany,Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany,Hörzentrum Oldenburg gGmbH, Oldenburg, Germany
| | - Stephan D. Ewert
- Medizinische Physik and Cluster of Excellence “Hearing4all”, Department of Medical Physics and Acoustics, Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
8
|
Eurich B, Encke J, Ewert SD, Dietz M. Lower interaural coherence in off-signal bands impairs binaural detection. J Acoust Soc Am 2022; 151:3927. [PMID: 35778173 DOI: 10.1121/10.0011673] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 05/24/2022] [Indexed: 06/15/2023]
Abstract
Differences in interaural phase configuration between a target and a masker can lead to substantial binaural unmasking. This effect is decreased for masking noises with an interaural time difference (ITD). Adding a second noise with an opposing ITD in most cases further reduces binaural unmasking. Thus far, modeling of these detection thresholds required both a mechanism for internal ITD compensation and an increased filter bandwidth. An alternative explanation for the reduction is that unmasking is impaired by the lower interaural coherence in off-frequency regions caused by the second masker [Marquardt and McAlpine (2009). J. Acoust. Soc. Am. 126(6), EL177-EL182]. Based on this hypothesis, the current work proposes a quantitative multi-channel model using monaurally derived peripheral filter bandwidths and an across-channel incoherence interference mechanism. This mechanism differs from wider filters since it has no effect when the masker coherence is constant across frequency bands. Combined with a monaural energy discrimination pathway, the model predicts the differences between a single delayed noise and two opposingly delayed noises as well as four other data sets. It helps resolve the inconsistency that simulating some data requires wide filters while others require narrow filters.
Collapse
Affiliation(s)
- Bernhard Eurich
- Department für Medizinische Physik und Akustik, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Jörg Encke
- Department für Medizinische Physik und Akustik, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Department für Medizinische Physik und Akustik, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Mathias Dietz
- Department für Medizinische Physik und Akustik, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
9
|
Abstract
Late reverberation involves the superposition of many sound reflections, approaching the properties of a diffuse sound field. Since the spatially resolved perception of individual late reflections is impossible, simplifications can potentially be made for modelling late reverberation in room acoustics simulations with reduced spatial resolution. Such simplifications are desired for interactive, real-time virtual acoustic environments with applications in hearing research and for the evaluation of hearing supportive devices. In this context, the number and spatial arrangement of loudspeakers used for playback additionally affect spatial resolution. The current study assessed the minimum number of spatially evenly distributed virtual late reverberation sources required to perceptually approximate spatially highly resolved isotropic and anisotropic late reverberation and to technically approximate a spherically isotropic sound field. The spatial resolution of the rendering was systematically reduced by using subsets of the loudspeakers of an 86-channel spherical loudspeaker array in an anechoic chamber, onto which virtual reverberation sources were mapped using vector base amplitude panning. It was tested whether listeners can distinguish lower spatial resolutions of reproduction of late reverberation from the highest achievable spatial resolution in different simulated rooms. The rendering of early reflections remained unchanged. The coherence of the sound field across a pair of microphones at ear and behind-the-ear hearing device distance was assessed to separate the effects of number of virtual sources and loudspeaker array geometry. Results show that between 12 and 24 reverberation sources are required for the rendering of late reverberation in virtual acoustic environments.
Collapse
Affiliation(s)
- Christoph Kirsch
- Medizinische Physik and Cluster of Excellence Hearing4All, 385626Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Josef Poppitz
- Akustik and Cluster of Excellence Hearing4All, 385626Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Torben Wendt
- Medizinische Physik and Cluster of Excellence Hearing4All, 385626Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany.,Akustik and Cluster of Excellence Hearing4All, 385626Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Steven van de Par
- Akustik and Cluster of Excellence Hearing4All, 385626Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4All, 385626Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
10
|
Kubiak AM, Rennies J, Ewert SD, Kollmeier B. Relation between hearing abilities and preferred playback settings for speech perception in complex listening conditions. Int J Audiol 2021; 61:965-974. [PMID: 34612124 DOI: 10.1080/14992027.2021.1980233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
OBJECTIVE This study investigated if individual preferences with respect to the trade-off between a good signal-to-noise ratio and a distortion-free speech target were stable across different masking conditions and if simple adjustment methods could be used to identify subjects as either "noise haters" or "distortions haters". DESIGN In each masking condition, subjects could adjust the target speech level according to their preferences by employing (i) linear gain or gain at the cost of (ii) clipping distortions or (iii) compression distortions. The comparison of these processing conditions allowed investigating the preferred trade-off between distortions and noise disturbance. STUDY SAMPLE Thirty subjects differing widely in hearing status (normal-hearing to moderately impaired) and age (23-85 years). RESULTS High test-retest stability of individual preferences was found for all modification schemes. The preference adjustments suggested that subjects could be consistently categorised along a scale from "noise haters" to "distortion haters", and this preference trait remained stable through all maskers, spatial conditions, and types of distortions. CONCLUSIONS Employing quick self-adjustment to collect listening preferences in complex listening conditions revealed a stable preference trait along the "noise vs. distortions" tolerance dimension. This could potentially help in fitting modern hearing aid algorithms to the individual user.
Collapse
Affiliation(s)
- Aleksandra M Kubiak
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Jan Rennies
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik, Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik, Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, Germany
| |
Collapse
|
11
|
Pieper I, Mauermann M, Kollmeier B, Ewert SD. Toward an Individual Binaural Loudness Model for Hearing Aid Fitting and Development. Front Psychol 2021; 12:634943. [PMID: 34239474 PMCID: PMC8258351 DOI: 10.3389/fpsyg.2021.634943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 05/27/2021] [Indexed: 11/25/2022] Open
Abstract
The individual loudness perception of a patient plays an important role in hearing aid satisfaction and use in daily life. Hearing aid fitting and development might benefit from individualized loudness models (ILMs), enabling better adaptation of the processing to individual needs. The central question is whether additional parameters are required for ILMs beyond non-linear cochlear gain loss and linear attenuation common to existing loudness models for the hearing impaired (HI). Here, loudness perception in eight normal hearing (NH) and eight HI listeners was measured in conditions ranging from monaural narrowband to binaural broadband, to systematically assess spectral and binaural loudness summation and their interdependence. A binaural summation stage was devised with empirical monaural loudness judgments serving as input. While NH showed binaural inhibition in line with the literature, binaural summation and its inter-subject variability were increased in HI, indicating the necessity for individualized binaural summation. Toward ILMs, a recent monaural loudness model was extended with the suggested binaural stage, and the number and type of additional parameters required to describe and to predict individual loudness were assessed. In addition to one parameter for the individual amount of binaural summation, a bandwidth-dependent monaural parameter was required to successfully account for individual spectral summation.
Collapse
Affiliation(s)
- Iko Pieper
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, Germany
| | - Manfred Mauermann
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
12
|
Biberger T, Schepker H, Denk F, Ewert SD. Instrumental Quality Predictions and Analysis of Auditory Cues for Algorithms in Modern Headphone Technology. Trends Hear 2021; 25:23312165211001219. [PMID: 33739186 PMCID: PMC7983238 DOI: 10.1177/23312165211001219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Smart headphones or hearables use different types of algorithms such as noise
cancelation, feedback suppression, and sound pressure equalization to eliminate
undesired sound sources or to achieve acoustical transparency. Such signal
processing strategies might alter the spectral composition or interaural
differences of the original sound, which might be perceived by listeners as
monaural or binaural distortions and thus degrade audio quality. To evaluate the
perceptual impact of these distortions, subjective quality ratings can be used,
but these are time consuming and costly. Auditory-inspired instrumental quality
measures can be applied with less effort and may also be helpful in identifying
whether the distortions impair the auditory representation of monaural or
binaural cues. Therefore, the goals of this study were (a) to assess the
applicability of various monaural and binaural audio quality models to
distortions typically occurring in hearables and (b) to examine the effect of
those distortions on the auditory representation of spectral, temporal, and
binaural cues. Results showed that the signal processing algorithms considered
in this study mainly impaired (monaural) spectral cues. Consequently, monaural
audio quality models that capture spectral distortions achieved the best
prediction performance. A recent audio quality model that predicts monaural and
binaural aspects of quality was revised based on parts of the current data
involving binaural audio quality aspects, leading to improved overall
performance indicated by a mean Pearson linear correlation of 0.89 between
obtained and predicted ratings.
Collapse
Affiliation(s)
- Thomas Biberger
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Henning Schepker
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Florian Denk
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Stephan D Ewert
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
13
|
Steffens H, van de Par S, Ewert SD. The role of early and late reflections on perception of source orientation. J Acoust Soc Am 2021; 149:2255. [PMID: 33940902 DOI: 10.1121/10.0003823] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 03/01/2021] [Indexed: 06/12/2023]
Abstract
Sound radiation of most natural sources, like human speakers or musical instruments, typically exhibits a spatial directivity pattern. This directivity contributes to the perception of sound sources in rooms, affecting the spatial energy distribution of early reflections and late diffuse reverberation. Thus, for convincing sound field reproduction and acoustics simulation, source directivity has to be considered. Whereas perceptual effects of directivity, such as source-orientation-dependent coloration, appear relevant for the direct sound and individual early reflections, it is unclear how spectral and spatial cues interact for later reflections. Better knowledge of the perceptual relevance of source orientation cues might help to simplify the acoustics simulation. Here, it is assessed as to what extent directivity of a human speaker should be simulated for early reflections and diffuse reverberation. The computationally efficient hybrid approach to simulate and auralize binaural room impulse responses [Wendt et al., J. Audio Eng. Soc. 62, 11 (2014)] was extended to simulate source directivity. Two psychoacoustic experiments assessed the listeners' ability to distinguish between different virtual source orientations when the frequency-dependent spatial directivity pattern of the source was approximated by a direction-independent average filter for different higher reflection orders. The results indicate that it is sufficient to simulate effects of source directivity in the first-order reflections.
Collapse
Affiliation(s)
- Henning Steffens
- Medizinische Physik, Universität Oldenburg, Oldenburg 26111, Germany
| | - Steven van de Par
- Acoustics Group and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg 26111, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg 26111, Germany
| |
Collapse
|
14
|
Kubiak AM, Rennies J, Ewert SD, Kollmeier B. Prediction of individual speech recognition performance in complex listening conditions. J Acoust Soc Am 2020; 147:1379. [PMID: 32237817 DOI: 10.1121/10.0000759] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 01/31/2020] [Indexed: 06/11/2023]
Abstract
This study examined how well individual speech recognition thresholds in complex listening scenarios could be predicted by a current binaural speech intelligibility model. Model predictions were compared with experimental data measured for seven normal-hearing and 23 hearing-impaired listeners who differed widely in their degree of hearing loss, age, as well as performance in clinical speech tests. The experimental conditions included two masker types (multi-talker or two-talker maskers), and two spatial conditions (maskers co-located with the frontal target or symmetrically separated from the target). The results showed that interindividual variability could not be well predicted by a model including only individual audiograms. Predictions improved when an additional individual "proficiency factor" was derived from one of the experimental conditions or a standard speech test. Overall, the current model can predict individual performance relatively well (except in conditions high in informational masking), but the inclusion of age-related factors may lead to even further improvements.
Collapse
Affiliation(s)
- Aleksandra M Kubiak
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all," Oldenburg, Germany
| | - Jan Rennies
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all," Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Birger Kollmeier
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all," Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
15
|
Ewert SD, Paraouty N, Lorenzi C. A two‐path model of auditory modulation detection using temporal fine structure and envelope cues. Eur J Neurosci 2020; 51:1265-1278. [DOI: 10.1111/ejn.13846] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 01/18/2018] [Accepted: 01/18/2018] [Indexed: 11/30/2022]
Affiliation(s)
- Stephan D. Ewert
- Medizinische Physik and Cluster of Excellence Hearing4All Universität Oldenburg 26111 Oldenburg Germany
| | - Nihaad Paraouty
- Laboratoire des systèmes perceptifs Département d’études cognitives, École normale supérieure CNRS PSL Research University Paris France
| | - Christian Lorenzi
- Laboratoire des systèmes perceptifs Département d’études cognitives, École normale supérieure CNRS PSL Research University Paris France
| |
Collapse
|
16
|
Biberger T, Ewert SD. The effect of room acoustical parameters on speech reception thresholds and spatial release from masking. J Acoust Soc Am 2019; 146:2188. [PMID: 31671969 DOI: 10.1121/1.5126694] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 08/30/2019] [Indexed: 06/10/2023]
Abstract
In daily life, speech intelligibility is affected by masking caused by interferers and by reverberation. For a frontal target speaker and two interfering sources symmetrically placed to either side, spatial release from masking (SRM) is observed in comparison to frontal interferers. In this case, the auditory system can make use of temporally fluctuating interaural time/phase and level differences promoting binaural unmasking (BU) and better-ear glimpsing (BEG). Reverberation affects the waveforms of the target and maskers, and the interaural differences, depending on the spatial configuration and on the room acoustical properties. In this study, the effect of room acoustics, temporal structure of the interferers, and target-masker positions on speech reception thresholds and SRM was assessed. The results were compared to an optimal better-ear glimpsing strategy to help disentangle energetic masking including effects of BU and BEG as well as informational masking (IM). In anechoic and moderate reverberant conditions, BU and BEG contributed to SRM of fluctuating speech-like maskers, while BU did not contribute in highly reverberant conditions. In highly reverberant rooms a SRM of up to 3 dB was observed for speech maskers, including effects of release from IM based on binaural cues.
Collapse
Affiliation(s)
- Thomas Biberger
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
17
|
Denk F, Ewert SD, Kollmeier B. On the limitations of sound localization with hearing devices. J Acoust Soc Am 2019; 146:1732. [PMID: 31590539 DOI: 10.1121/1.5126521] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 08/28/2019] [Indexed: 06/10/2023]
Abstract
Limited abilities to localize sound sources and other reduced spatial hearing capabilities remain a largely unsolved issue in hearing devices like hearing aids or hear-through headphones. Hence, the impact of the microphone location, signal bandwidth, different equalization approaches, as well as processing delays in superposition with direct sound leaking through a vent was addressed in this study. A localization experiment was performed with normal-hearing subjects using individual binaural synthesis to separately assess the above-mentioned potential limiting issues for localization in the horizontal and vertical plane with linear hearing devices. To this end, listening through hearing devices was simulated utilizing transfer functions for six different microphone locations, measured both individually and on a dummy head. Results show that the microphone location is the governing factor for localization abilities with linear hearing devices, and non-optimal microphone locations have a disruptive influence on localization in the vertical domain, and an effect on lateral sound localization. Processing delays cause additional detrimental effects for lateral sound localization; and diffuse-field equalization to the open-ear response leads to better localization performance than free-field equalization. Stimuli derived from dummy head measurements are unsuited for evaluating individual localization abilities with a hearing device.
Collapse
Affiliation(s)
- Florian Denk
- Medizinische Physik and Cluster of Excellence "Hearing4all," Universität Oldenburg, Küpkersweg 74, 26129 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence "Hearing4all," Universität Oldenburg, Küpkersweg 74, 26129 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence "Hearing4all," Universität Oldenburg, Küpkersweg 74, 26129 Oldenburg, Germany
| |
Collapse
|
18
|
Denk F, Ernst SMA, Ewert SD, Kollmeier B. Adapting Hearing Devices to the Individual Ear Acoustics: Database and Target Response Correction Functions for Various Device Styles. Trends Hear 2019; 22:2331216518779313. [PMID: 29877161 PMCID: PMC5992802 DOI: 10.1177/2331216518779313] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
To achieve a natural sound quality when listening through hearing devices, the sound pressure at the eardrum should replicate that of the open ear, modified only by an insertion gain if desired. A target approximating this reference condition can be computed by applying an appropriate correction function to the pressure observed at the device microphone. Such Target Response Correction Functions (TRCF) can be defined based on the directionally dependent relative transfer function between the location of the hearing device microphone and the eardrum of the open ear. However, it is unclear how exactly the TRCF should be derived, and how large the benefit of individual, versus generic, correction is. We present measurements of Head-Related Transfer Functions (HRTF) at the eardrum and at 9 microphone locations of a comprehensive set of 5 hearing device styles, including 91 incidence directions, and recorded in 16 subjects and 2 dummy heads. Based on these HRTFs, individualized and generic TRCF were computed for frontal (referred to as free-field) and diffuse-field sound incidence. Spectral deviations between the computed target and listening with the open ear were evaluated using an auditory model and virtual acoustic scenes. Results indicate that a correction for diffuse-field incidence should be preferred over the free field, and individual correction functions result in notably reduced spectral deviations to open-ear listening, as compared with generic correction functions. These outcomes depend substantially on the specific device style. The HRTF database and derived TRCFs are publicly available.
Collapse
Affiliation(s)
- Florian Denk
- 1 Medizinische Physik and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Stephan M A Ernst
- 1 Medizinische Physik and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,2 ENT Clinic, University Hospital of Gießen and Marburg, Gießen, Germany
| | - Stephan D Ewert
- 1 Medizinische Physik and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Birger Kollmeier
- 1 Medizinische Physik and Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
19
|
Schutte M, Ewert SD, Wiegrebe L. The percept of reverberation is not affected by visual room impression in virtual environments. J Acoust Soc Am 2019; 145:EL229. [PMID: 31067971 DOI: 10.1121/1.5093642] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 02/21/2019] [Indexed: 06/09/2023]
Abstract
Humans possess mechanisms to suppress distracting early sound reflections, summarized as the precedence effect. Recent work shows that precedence is affected by visual stimulation. This paper investigates possible effects of visual stimulation on the perception of later reflections, i.e., reverberation. In a highly immersive audio-visual virtual reality environment, subjects were asked to quantify reverberation in conditions where simultaneously presented auditory and visual stimuli either match in room identity, sound source azimuth, and sound source distance, or diverge in one of these aspects. While subjects reliably judged reverberation across acoustic environments, the visual room impression did not affect reverberation estimates.
Collapse
Affiliation(s)
- Michael Schutte
- Division of Neurobiology, Department Biology II and Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Germany
| | - Stephan D Ewert
- Medical Physics and Cluster of Excellence Hearing4all, University of Oldenburg, , ,
| | - Lutz Wiegrebe
- Division of Neurobiology, Department Biology II and Graduate School of Systemic Neurosciences, Ludwig-Maximilians-Universität München, Germany
| |
Collapse
|
20
|
Denk F, Ewert SD, Kollmeier B. Spectral directional cues captured by hearing device microphones in individual human ears. J Acoust Soc Am 2018; 144:2072. [PMID: 30404454 DOI: 10.1121/1.5056173] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 09/11/2018] [Indexed: 06/08/2023]
Abstract
Spatial hearing abilities with hearing devices ultimately depend on how well acoustic directional cues are captured by the microphone(s) of the device. A comprehensive objective evaluation of monaural spectral directional cues captured at 9 microphone locations integrated in 5 hearing device styles is presented, utilizing a recent database of head-related transfer functions (HRTFs) that includes data from 16 human and 3 artificial ear pairs. Differences between HRTFs to the eardrum and hearing device microphones were assessed by descriptive analyses and quantitative metrics, and compared to differences between individual ears. Directional information exploited for vertical sound localization was evaluated by means of computational models. Directional information at microphone locations inside the pinna is significantly biased and qualitatively poorer compared to locations in the ear canal; behind-the-ear microphones capture almost no directional cues. These errors are expected to impair vertical sound localization, even if the new cues would be optimally mapped to locations. Differences between HRTFs to the eardrum and hearing device microphones are qualitatively different from between-subject differences and can be described as a partial destruction rather than an alteration of relevant cues, although spectral difference metrics produce similar results. Dummy heads do not fully reflect the results with individual subjects.
Collapse
Affiliation(s)
- Florian Denk
- Medizinische Physik and Cluster of Excellence "Hearing4all," University of Oldenburg, Küpkersweg 74, 26129 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence "Hearing4all," University of Oldenburg, Küpkersweg 74, 26129 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence "Hearing4all," University of Oldenburg, Küpkersweg 74, 26129 Oldenburg, Germany
| |
Collapse
|
21
|
Pieper I, Mauermann M, Oetting D, Kollmeier B, Ewert SD. Physiologically motivated individual loudness model for normal hearing and hearing impaired listeners. J Acoust Soc Am 2018; 144:917. [PMID: 30180690 DOI: 10.1121/1.5050518] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 07/27/2018] [Indexed: 06/08/2023]
Abstract
A loudness model with a central gain is suggested to improve individualized predictions of loudness scaling data from normal hearing and hearing impaired listeners. The current approach is based on the loudness model of Pieper et al. [(2016). J. Acoust. Soc. Am. 139, 2896], which simulated the nonlinear inner ear mechanics as transmission-line model in a physical and physiological plausible way. Individual hearing thresholds were simulated by a cochlear gain reduction in the transmission-line model and linear attenuation (damage of inner hair cells) prior to an internal threshold. This and similar approaches of current loudness models that characterize the individual hearing loss were shown to be insufficient to account for individual loudness perception, in particular at high stimulus levels close to the uncomfortable level. An additional parameter, termed "post gain," was introduced to improve upon the previous models. The post gain parameter amplifies the signal parts above the internal threshold and can better account for individual variations in the overall steepness of loudness functions and for variations in the uncomfortable level which are independent of the hearing loss. The post gain can be interpreted as a central gain occurring at higher stages as a result of peripheral deafferentation.
Collapse
Affiliation(s)
- Iko Pieper
- Medical Physics and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, D-26111, Germany
| | - Manfred Mauermann
- Medical Physics and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, D-26111, Germany
| | - Dirk Oetting
- HörTech gGmbH and Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Birger Kollmeier
- Medical Physics and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, D-26111, Germany
| | - Stephan D Ewert
- Medical Physics and Cluster of Excellence Hearing4All, Universität Oldenburg, Oldenburg, D-26111, Germany
| |
Collapse
|
22
|
Hu H, Dietz M, Williges B, Ewert SD. Better-ear glimpsing with symmetrically-placed interferers in bilateral cochlear implant users. J Acoust Soc Am 2018; 143:2128. [PMID: 29716260 DOI: 10.1121/1.5030918] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
For a frontal target in spatially symmetrically placed interferers, normal hearing (NH) listeners can use "better-ear glimpsing" to select time-frequency segments with favorable signal-to-noise ratio in either ear. With an ideal monaural better-ear mask (IMBM) processing, some studies showed that NH listeners can reach similar performance as in the natural binaural listening condition, although interaural phase differences at low frequencies can further improve performance. In principle, bilateral cochlear implant (BiCI) listeners could use the same better-ear glimpsing, albeit without exploiting interaural phase differences. Speech reception thresholds of NH and BiCI listeners were measured in three interferers (speech-shaped stationary noise, nonsense speech, or single talker) either co-located with the target, symmetrically placed at ±60°, or independently presented to each ear, with and without IMBM processing. Furthermore, a bilateral noise vocoder based on the BiCI electrodogram was used in the same NH listeners. Headphone presentation and direct stimulation with head-related transfer functions for spatialization were used in NH and BiCI listeners, respectively. Compared to NH listeners, both NH listeners with vocoder and BiCI listeners showed strongly reduced binaural benefit from spatial separation. However, both groups greatly benefited from IMBM processing as part of the stimulation strategy.
Collapse
Affiliation(s)
- Hongmei Hu
- Medizinische Physik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence "Hearing4all," Küpkersweg 74, 26129, Oldenburg, Germany
| | - Mathias Dietz
- Medizinische Physik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence "Hearing4all," Küpkersweg 74, 26129, Oldenburg, Germany
| | - Ben Williges
- Medizinische Physik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence "Hearing4all," Küpkersweg 74, 26129, Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence "Hearing4all," Küpkersweg 74, 26129, Oldenburg, Germany
| |
Collapse
|
23
|
Abstract
OBJECTIVE Binaural cues such as interaural level differences (ILDs) are used to organise auditory perception and to segregate sound sources in complex acoustical environments. In bilaterally fitted hearing aids, dynamic-range compression operating independently at each ear potentially alters these ILDs, thus distorting binaural perception and sound source segregation. DESIGN A binaurally-linked model-based fast-acting dynamic compression algorithm designed to approximate the normal-hearing basilar membrane (BM) input-output function in hearing-impaired listeners is suggested. A multi-center evaluation in comparison with an alternative binaural and two bilateral fittings was performed to assess the effect of binaural synchronisation on (a) speech intelligibility and (b) perceived quality in realistic conditions. STUDY SAMPLE 30 and 12 hearing impaired (HI) listeners were aided individually with the algorithms for both experimental parts, respectively. RESULTS A small preference towards the proposed model-based algorithm in the direct quality comparison was found. However, no benefit of binaural-synchronisation regarding speech intelligibility was found, suggesting a dominant role of the better ear in all experimental conditions. CONCLUSION The suggested binaural synchronisation of compression algorithms showed a limited effect on the tested outcome measures, however, linking could be situationally beneficial to preserve a natural binaural perception of the acoustical environment.
Collapse
Affiliation(s)
- Stephan M A Ernst
- a Medizinische Physik and Cluster of Excellence Hearing4all , Carl-von-Ossietzky Universität Oldenburg , Oldenburg , Germany and
| | - Steffen Kortlang
- a Medizinische Physik and Cluster of Excellence Hearing4all , Carl-von-Ossietzky Universität Oldenburg , Oldenburg , Germany and
| | - Giso Grimm
- a Medizinische Physik and Cluster of Excellence Hearing4all , Carl-von-Ossietzky Universität Oldenburg , Oldenburg , Germany and.,b HörTech gGmbH , Oldenburg , Germany
| | | | - Birger Kollmeier
- a Medizinische Physik and Cluster of Excellence Hearing4all , Carl-von-Ossietzky Universität Oldenburg , Oldenburg , Germany and.,b HörTech gGmbH , Oldenburg , Germany
| | - Stephan D Ewert
- a Medizinische Physik and Cluster of Excellence Hearing4all , Carl-von-Ossietzky Universität Oldenburg , Oldenburg , Germany and
| |
Collapse
|
24
|
Dietz M, Lestang JH, Majdak P, Stern RM, Marquardt T, Ewert SD, Hartmann WM, Goodman DFM. A framework for testing and comparing binaural models. Hear Res 2017; 360:92-106. [PMID: 29208336 DOI: 10.1016/j.heares.2017.11.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/03/2017] [Accepted: 11/24/2017] [Indexed: 11/19/2022]
Abstract
Auditory research has a rich history of combining experimental evidence with computational simulations of auditory processing in order to deepen our theoretical understanding of how sound is processed in the ears and in the brain. Despite significant progress in the amount of detail and breadth covered by auditory models, for many components of the auditory pathway there are still different model approaches that are often not equivalent but rather in conflict with each other. Similarly, some experimental studies yield conflicting results which has led to controversies. This can be best resolved by a systematic comparison of multiple experimental data sets and model approaches. Binaural processing is a prominent example of how the development of quantitative theories can advance our understanding of the phenomena, but there remain several unresolved questions for which competing model approaches exist. This article discusses a number of current unresolved or disputed issues in binaural modelling, as well as some of the significant challenges in comparing binaural models with each other and with the experimental data. We introduce an auditory model framework, which we believe can become a useful infrastructure for resolving some of the current controversies. It operates models over the same paradigms that are used experimentally. The core of the proposed framework is an interface that connects three components irrespective of their underlying programming language: The experiment software, an auditory pathway model, and task-dependent decision stages called artificial observers that provide the same output format as the test subject.
Collapse
Affiliation(s)
- Mathias Dietz
- National Centre for Audiology, Western University, London, ON, Canada.
| | - Jean-Hugues Lestang
- Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom
| | - Piotr Majdak
- Institut für Schallforschung, Österreichische Akademie der Wissenschaften, Wien, Austria
| | | | | | - Stephan D Ewert
- Medizinische Physik, Universität Oldenburg, Oldenburg, Germany
| | | | - Dan F M Goodman
- Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom
| |
Collapse
|
25
|
Abstract
OBJECTIVE Loudness perception of binaural broadband signals, e.g. speech shaped noise, shows large individual differences using frequency-dependent amplification which was adjusted to restore the loudness perception of monaural narrowband signals in hearing-impaired (HI) listeners. To better understand and quantify this highly individual effect, loudness perception of broadband stimuli consisting of a number of spectrally separated narrowband components which where individually adjusted to equal loudness is of interest. DESIGN Based on categorical loudness scaling, the loudness of an equal categorical loudness noise (ECLN) consisting of six third-octave noises was assessed. For loudness categories "medium" und "very loud" the required narrowband loudness was analysed. STUDY SAMPLE Nine normal-hearing (NH) and ten HI listeners. RESULTS HI listeners showed lower narrowband loudness values compared to NH listeners, indicating an increased spectral loudness summation. More than 50% of the HI listeners showed higher binaural spectral loudness summation compared to NH listeners. The amount of binaural spectral loudness summation was highly correlated (r2 = 0.92) with the loudness level at "very loud" of aided speech shaped noise. CONCLUSIONS The suggested ECLN measurement is suited to assess individual (binaural) broadband loudness in aided conditions, providing valuable information for hearing-aid fitting.
Collapse
Affiliation(s)
- Stephan D Ewert
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and
| | - Dirk Oetting
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and.,b Project Group Hearing, Speech and Audio Technology of the Fraunhofer IDMT and Cluster of Excellence Hearing4all , Oldenburg , Germany
| |
Collapse
|
26
|
Biberger T, Ewert SD. The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking. J Acoust Soc Am 2017; 142:1098. [PMID: 28863616 DOI: 10.1121/1.4999059] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023-1038], combining the "classical" concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524-540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436-446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.
Collapse
Affiliation(s)
- Thomas Biberger
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
27
|
Ewert SD, Schubotz W, Brand T, Kollmeier B. Binaural masking release in symmetric listening conditions with spectro-temporally modulated maskers. J Acoust Soc Am 2017; 142:12. [PMID: 28764456 DOI: 10.1121/1.4990019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Speech reception thresholds (SRTs) decrease as target and maskers are spatially separated (spatial release from masking, SRM). The current study systematically assessed how SRTs and SRM for a frontal target in a spatially symmetric masker configuration depend on spectro-temporal masker properties, the availability of short-time interaural level difference (ILD) and interaural time difference (ITD), and informational masking. Maskers ranged from stationary noise to single, interfering talkers and were modified by head-related transfer functions to provide: (i) different binaural cues (ILD, ITD, or both) and (ii) independent maskers in each ear ("infinite ILD"). Additionally, a condition was tested in which only information from short-time spectro-temporal segments of the ear with a favorable signal-to-noise ratio (better-ear glimpses) was presented. For noise-based maskers, ILD, ITD, and spectral changes related to masker location contributed similarly to SRM, while ILD cues played a larger role if temporal modulation was introduced. For speech maskers, glimpsing and perceived location contributed roughly equally and ITD contributed less. The "infinite ILD" condition might suggest better-ear glimpsing limitations resulting in a maximal SRM of 12 dB for maskers with low or absent informational masking. Comparison to binaural model predictions highlighted the importance of short-time processing and helped to clarify the contribution of the different binaural cues and mechanisms.
Collapse
Affiliation(s)
- Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Wiebke Schubotz
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Thomas Brand
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| |
Collapse
|
28
|
Kortlang S, Chen Z, Gerkmann T, Kollmeier B, Hohmann V, Ewert SD. Evaluation of combined dynamic compression and single channel noise reduction for hearing aid applications. Int J Audiol 2017; 57:S43-S54. [PMID: 28355947 DOI: 10.1080/14992027.2017.1300695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVE Single-channel noise reduction (SCNR) and dynamic range compression (DRC) are important elements in hearing aids. Only relatively few studies have addressed interaction effects and typically used real hearing aids with limited knowledge about the integrated algorithms. Here the potential benefit of different combinations and integration of SCNR and DRC was systematically assessed. DESIGN Ten different systems combining SCNR and DRC were implemented, including five serial arrangements, a parallel and two multiplicative approaches. In an instrumental evaluation, signal-to-noise ratio (SNR) improvement and spectral contrast enhancement (SCE) were assessed. Quality ratings at 0 and +6 dB SNR, and speech reception thresholds (SRTs) in noise were measured using stationary and babble noise. STUDY SAMPLE Thirteen young normal-hearing (NH) listeners and 12 hearing-impaired (HI) listeners participated. RESULTS In line with an increased segmental SNR and spectral contrast compared to a serial concatenation, the parallel approach significantly reduced the perceived noise annoyance for both subject groups. The proposed multiplicative approaches could partly counteract increased speech distortions introduced by DRC and achieved the best overall quality for the HI listeners. CONCLUSIONS For high SNRs well above the individual SRT, the specific combination of SCNR and DRC is perceptually relevant and the integrative approaches were preferred.
Collapse
Affiliation(s)
- Steffen Kortlang
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and
| | - Zhangli Chen
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and
| | - Timo Gerkmann
- b Speech Signal Processing and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany
| | - Birger Kollmeier
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and
| | - Volker Hohmann
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and
| | - Stephan D Ewert
- a Medizinische Physik and Cluster of Excellence Hearing4all , Universität Oldenburg , Oldenburg , Germany and
| |
Collapse
|
29
|
Hu H, Ewert SD, McAlpine D, Dietz M. Differences in the temporal course of interaural time difference sensitivity between acoustic and electric hearing in amplitude modulated stimuli. J Acoust Soc Am 2017; 141:1862. [PMID: 28372072 DOI: 10.1121/1.4977014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Previous studies have shown that normal-hearing (NH) listeners' spatial perception of non-stationary interaural time differences (ITDs) is dominated by the carrier ITD during rising amplitude segments. Here, ITD sensitivity throughout the amplitude-modulation cycle in NH listeners and bilateral cochlear implant (CI) subjects is compared, the latter by means of direct stimulation of a single electrode pair. The data indicate that, while NH listeners are most sensitive to ITDs applied toward the beginning of a modulation cycle at 600 Hz, NH listeners at 200 Hz and especially bilateral CI subjects at 200 pulses per second (pps) are more sensitive to ITDs applied to the modulation maximum. This has implications for spatial-hearing in complex environments: NH listeners' dominant 600-Hz ITD information from the rising amplitude segments comprises direct sound information. The 200-pps low rate required to get ITD sensitivity in CI users results in a higher weight of pulses later in the modulation cycle where the source ITDs are more likely corrupted by reflections. This indirectly indicates that even if future binaural CI processors are able to provide perceptually exploitable ITD information, CI users will likely not get the full benefit from such pulse-based ITD cues in reverberant and other complex environments.
Collapse
Affiliation(s)
- Hongmei Hu
- Medizinische Physik and Cluster of Excellence "Hearing4all," Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence "Hearing4all," Universität Oldenburg, D-26111 Oldenburg, Germany
| | - David McAlpine
- Department of Linguistics, Australian Hearing Hub, Macquarie University, New South Wales 2109, Australia
| | - Mathias Dietz
- Medizinische Physik and Cluster of Excellence "Hearing4all," Universität Oldenburg, D-26111 Oldenburg, Germany
| |
Collapse
|
30
|
Wallaert N, Moore BCJ, Ewert SD, Lorenzi C. Sensorineural hearing loss enhances auditory sensitivity and temporal integration for amplitude modulation. J Acoust Soc Am 2017; 141:971. [PMID: 28253641 DOI: 10.1121/1.4976080] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Amplitude-modulation detection thresholds (AMDTs) were measured at 40 dB sensation level for listeners with mild-to-moderate sensorineural hearing loss (age: 50-64 yr) for a carrier frequency of 500 Hz and rates of 2 and 20 Hz. The number of modulation cycles, N, varied between two and nine. The data were compared with AMDTs measured for young and older normal-hearing listeners [Wallaert, Moore, and Lorenzi (2016). J. Acoust. Soc. Am. 139, 3088-3096]. As for normal-hearing listeners, AMDTs were lower for the 2-Hz than for the 20-Hz rate, and AMDTs decreased with increasing N. AMDTs were lower for hearing-impaired listeners than for normal-hearing listeners, and the effect of increasing N was greater for hearing-impaired listeners. A computational model based on the modulation-filterbank concept and a template-matching decision strategy was developed to account for the data. The psychophysical and simulation data suggest that the loss of amplitude compression in the impaired cochlea is mainly responsible for the enhanced sensitivity and temporal integration of temporal envelope cues found for hearing-impaired listeners. The data also suggest that, for AM detection, cochlear damage is associated with increased internal noise, but preserved short-term memory and decision mechanisms.
Collapse
Affiliation(s)
- Nicolas Wallaert
- UMR CNRS LSP 8248, Institut d'Etude de la Cognition, Ecole normale supérieure, Paris Sciences et Lettres Research University, 29 rue d'Ulm, 75005 Paris, France
| | - Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Christian Lorenzi
- UMR CNRS LSP 8248, Institut d'Etude de la Cognition, Ecole normale supérieure, Paris Sciences et Lettres Research University, 29 rue d'Ulm, 75005 Paris, France
| |
Collapse
|
31
|
Abstract
Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123-177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181-1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Collapse
Affiliation(s)
- Thomas Biberger
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
32
|
Schubotz W, Brand T, Kollmeier B, Ewert SD. Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. J Acoust Soc Am 2016; 140:524. [PMID: 27475175 DOI: 10.1121/1.4955079] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models.
Collapse
Affiliation(s)
- Wiebke Schubotz
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Thomas Brand
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| |
Collapse
|
33
|
Paraouty N, Ewert SD, Wallaert N, Lorenzi C. Interactions between amplitude modulation and frequency modulation processing: Effects of age and hearing loss. J Acoust Soc Am 2016; 140:121. [PMID: 27475138 DOI: 10.1121/1.4955078] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Frequency modulation (FM) and amplitude modulation (AM) detection thresholds were measured for a 500-Hz carrier frequency and a 5-Hz modulation rate. For AM detection, FM at the same rate as the AM was superimposed with varying FM depth. For FM detection, AM at the same rate was superimposed with varying AM depth. The target stimuli always contained both amplitude and frequency modulations, while the standard stimuli only contained the interfering modulation. Young and older normal-hearing listeners, as well as older listeners with mild-to-moderate sensorineural hearing loss were tested. For all groups, AM and FM detection thresholds were degraded in the presence of the interfering modulation. AM detection with and without interfering FM was hardly affected by either age or hearing loss. While aging had an overall detrimental effect on FM detection with and without interfering AM, there was a trend that hearing loss further impaired FM detection in the presence of AM. Several models using optimal combination of temporal-envelope cues at the outputs of off-frequency filters were tested. The interfering effects could only be predicted for hearing-impaired listeners. This indirectly supports the idea that, in addition to envelope cues resulting from FM-to-AM conversion, normal-hearing listeners use temporal fine-structure cues for FM detection.
Collapse
Affiliation(s)
- Nihaad Paraouty
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Institut d'Etude de la Cognition, Ecole normale supérieure, Paris Sciences et Lettres Research University, 29 rue d'Ulm, 75005 Paris, France
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Nicolas Wallaert
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Institut d'Etude de la Cognition, Ecole normale supérieure, Paris Sciences et Lettres Research University, 29 rue d'Ulm, 75005 Paris, France
| | - Christian Lorenzi
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Institut d'Etude de la Cognition, Ecole normale supérieure, Paris Sciences et Lettres Research University, 29 rue d'Ulm, 75005 Paris, France
| |
Collapse
|
34
|
Pieper I, Mauermann M, Kollmeier B, Ewert SD. Physiological motivated transmission-lines as front end for loudness models. J Acoust Soc Am 2016; 139:2896. [PMID: 27250182 DOI: 10.1121/1.4949540] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The perception of loudness is strongly influenced by peripheral auditory processing, which calls for a physiologically correct peripheral auditory processing stage when constructing advanced loudness models. Most loudness models, however, rather follow a functional approach: a parallel auditory filter bank combined with a compression stage, followed by spectral and temporal integration. Such classical loudness models do not allow to directly link physiological measurements like otoacoustic emissions to properties of their auditory filterbank. However, this can be achieved with physiologically motivated transmission-line models (TLMs) of the cochlea. Here two active and nonlinear TLMs were tested as the peripheral front end of a loudness model. The TLMs are followed by a simple generic back end which performs integration of basilar-membrane "excitation" across place and time to yield a loudness estimate. The proposed model approach reaches similar performance as other state-of-the-art loudness models regarding the prediction of loudness in sones, equal-loudness contours (including spectral fine structure), and loudness as a function of bandwidth. The suggested model provides a powerful tool to directly connect objective measures of basilar membrane compression, such as distortion product otoacoustic emissions, and loudness in future studies.
Collapse
Affiliation(s)
- Iko Pieper
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Manfred Mauermann
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4All, Universität Oldenburg, D-26111 Oldenburg, Germany
| |
Collapse
|
35
|
Schädler MR, Warzybok A, Ewert SD, Kollmeier B. A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception. J Acoust Soc Am 2016; 139:2708. [PMID: 27250164 DOI: 10.1121/1.4948772] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A framework for simulating auditory discrimination experiments, based on an approach from Schädler, Warzybok, Hochmuth, and Kollmeier [(2015). Int. J. Audiol. 54, 100-107] which was originally designed to predict speech recognition thresholds, is extended to also predict psychoacoustic thresholds. The proposed framework is used to assess the suitability of different auditory-inspired feature sets for a range of auditory discrimination experiments that included psychoacoustic as well as speech recognition experiments in noise. The considered experiments were 2 kHz tone-in-broadband-noise simultaneous masking depending on the tone length, spectral masking with simultaneously presented tone signals and narrow-band noise maskers, and German Matrix sentence test reception threshold in stationary and modulated noise. The employed feature sets included spectro-temporal Gabor filter bank features, Mel-frequency cepstral coefficients, logarithmically scaled Mel-spectrograms, and the internal representation of the Perception Model from Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102(5), 2892-2905]. The proposed framework was successfully employed to simulate all experiments with a common parameter set and obtain objective thresholds with less assumptions compared to traditional modeling approaches. Depending on the feature set, the simulated reference-free thresholds were found to agree with-and hence to predict-empirical data from the literature. Across-frequency processing was found to be crucial to accurately model the lower speech reception threshold in modulated noise conditions than in stationary noise conditions.
Collapse
Affiliation(s)
- Marc René Schädler
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Anna Warzybok
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany
| |
Collapse
|
36
|
Oetting D, Hohmann V, Appell JE, Kollmeier B, Ewert SD. Spectral and binaural loudness summation for hearing-impaired listeners. Hear Res 2016; 335:179-192. [PMID: 27006003 DOI: 10.1016/j.heares.2016.03.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 03/15/2016] [Accepted: 03/17/2016] [Indexed: 11/30/2022]
Abstract
Sensorineural hearing loss typically results in a steepened loudness function and a reduced dynamic range from elevated thresholds to uncomfortably loud levels for narrowband and broadband signals. Restoring narrowband loudness perception for hearing-impaired (HI) listeners can lead to overly loud perception of broadband signals and it is unclear how binaural presentation affects loudness perception in this case. Here, loudness perception quantified by categorical loudness scaling for nine normal-hearing (NH) and ten HI listeners was compared for signals with different bandwidth and different spectral shape in monaural and in binaural conditions. For the HI listeners, frequency- and level-dependent amplification was used to match the narrowband monaural loudness functions of the NH listeners. The average loudness functions for NH and HI listeners showed good agreement for monaural broadband signals. However, HI listeners showed substantially greater loudness for binaural broadband signals than NH listeners: on average a 14.1 dB lower level was required to reach "very loud" (range 30.8 to -3.7 dB). Overall, with narrowband loudness compensation, a given binaural loudness for broadband signals above "medium loud" was reached at systematically lower levels for HI than for NH listeners. Such increased binaural loudness summation was not found for loudness categories below "medium loud" or for narrowband signals. Large individual variations in the increased loudness summation were observed and could not be explained by the audiogram or the narrowband loudness functions.
Collapse
Affiliation(s)
- Dirk Oetting
- Project Group Hearing, Speech and Audio Technology of the Fraunhofer IDMT and Cluster of Excellence Hearing4all, Oldenburg, Germany; Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany.
| | - Volker Hohmann
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Jens-E Appell
- Project Group Hearing, Speech and Audio Technology of the Fraunhofer IDMT and Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Birger Kollmeier
- Project Group Hearing, Speech and Audio Technology of the Fraunhofer IDMT and Cluster of Excellence Hearing4all, Oldenburg, Germany; Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
37
|
Schubotz W, Brand T, Kollmeier B, Ewert SD. The Influence of High-Frequency Envelope Information on Low-Frequency Vowel Identification in Noise. PLoS One 2016; 11:e0145610. [PMID: 26730702 PMCID: PMC4701218 DOI: 10.1371/journal.pone.0145610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 12/07/2015] [Indexed: 11/19/2022] Open
Abstract
Vowel identification in noise using consonant-vowel-consonant (CVC) logatomes was used to investigate a possible interplay of speech information from different frequency regions. It was hypothesized that the periodicity conveyed by the temporal envelope of a high frequency stimulus can enhance the use of the information carried by auditory channels in the low-frequency region that share the same periodicity. It was further hypothesized that this acts as a strobe-like mechanism and would increase the signal-to-noise ratio for the voiced parts of the CVCs. In a first experiment, different high-frequency cues were provided to test this hypothesis, whereas a second experiment examined more closely the role of amplitude modulations and intact phase information within the high-frequency region (4–8 kHz). CVCs were either natural or vocoded speech (both limited to a low-pass cutoff-frequency of 2.5 kHz) and were presented in stationary 3-kHz low-pass filtered masking noise. The experimental results did not support the hypothesized use of periodicity information for aiding low-frequency perception.
Collapse
Affiliation(s)
- Wiebke Schubotz
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
- * E-mail:
| | - Thomas Brand
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Stephan D. Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
38
|
Kortlang S, Mauermann M, Ewert SD. Suprathreshold auditory processing deficits in noise: Effects of hearing loss and age. Hear Res 2016; 331:27-40. [DOI: 10.1016/j.heares.2015.10.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 10/05/2015] [Accepted: 10/07/2015] [Indexed: 11/15/2022]
|
39
|
Abstract
Current cochlear implant (CI) strategies carry speech information via the waveform envelope in frequency subbands. CIs require efficient speech processing to maximize information transfer to the brain, especially in background noise, where the speech envelope is not robust to noise interference. In such conditions, the envelope, after decomposition into frequency bands, may be enhanced by sparse transformations, such as nonnegative matrix factorization (NMF). Here, a novel CI processing algorithm is described, which works by applying NMF to the envelope matrix (envelopogram) of 22 frequency channels in order to improve performance in noisy environments. It is evaluated for speech in eight-talker babble noise. The critical sparsity constraint parameter was first tuned using objective measures and then evaluated with subjective speech perception experiments for both normal hearing and CI subjects. Results from vocoder simulations with 10 normal hearing subjects showed that the algorithm significantly enhances speech intelligibility with the selected sparsity constraints. Results from eight CI subjects showed no significant overall improvement compared with the standard advanced combination encoder algorithm, but a trend toward improvement of word identification of about 10 percentage points at +15 dB signal-to-noise ratio (SNR) was observed in the eight CI subjects. Additionally, a considerable reduction of the spread of speech perception performance from 40% to 93% for advanced combination encoder to 80% to 100% for the suggested NMF coding strategy was observed.
Collapse
Affiliation(s)
- Hongmei Hu
- Institute of Sound and Vibration Research, University of Southampton, UK Medizinische Physik, Universität Oldenburg and Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Mark E Lutman
- Institute of Sound and Vibration Research, University of Southampton, UK
| | - Stephan D Ewert
- Medizinische Physik, Universität Oldenburg and Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Guoping Li
- Institute of Sound and Vibration Research, University of Southampton, UK The Ear Institute, Faculty of Brain Sciences, University College London, UK
| | - Stefan Bleeck
- Institute of Sound and Vibration Research, University of Southampton, UK
| |
Collapse
|
40
|
Kayser H, Hohmann V, Ewert SD, Kollmeier B, Anemüller J. Robust auditory localization using probabilistic inference and coherence-based weighting of interaural cues. J Acoust Soc Am 2015; 138:2635-2648. [PMID: 26627742 DOI: 10.1121/1.4932588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Robust sound source localization is performed by the human auditory system even in challenging acoustic conditions and in previously unencountered, complex scenarios. Here a computational binaural localization model is proposed that possesses mechanisms for handling of corrupted or unreliable localization cues and generalization across different acoustic situations. Central to the model is the use of interaural coherence, measured as interaural vector strength (IVS), to dynamically weight the importance of observed interaural phase (IPD) and level (ILD) differences in frequency bands up to 1.4 kHz. This is accomplished through formulation of a probabilistic model in which the ILD and IPD distributions pertaining to a specific source location are dependent on observed interaural coherence. Bayesian computation of the direction-of-arrival probability map naturally leads to coherence-weighted integration of location cues across frequency and time. Results confirm the model's validity through statistical analyses of interaural parameter values. Simulated localization experiments show that even data points with low reliability (i.e., low IVS) can be exploited to enhance localization performance. A temporal integration length of at least 200 ms is required to gain a benefit; this is in accordance with previous psychoacoustic findings on temporal integration of spatial cues in the human auditory system.
Collapse
Affiliation(s)
- Hendrik Kayser
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Volker Hohmann
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Jörn Anemüller
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
41
|
Oetting D, Brand T, Ewert SD. Optimized loudness-function estimation for categorical loudness scaling data. Hear Res 2014; 316:16-27. [DOI: 10.1016/j.heares.2014.07.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 07/03/2014] [Accepted: 07/09/2014] [Indexed: 11/25/2022]
|
42
|
Jürgens T, Ewert SD, Kollmeier B, Brand T. Prediction of consonant recognition in quiet for listeners with normal and impaired hearing using an auditory model. J Acoust Soc Am 2014; 135:1506-1517. [PMID: 24606286 DOI: 10.1121/1.4864293] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Consonant recognition was assessed in normal-hearing (NH) and hearing-impaired (HI) listeners in quiet as a function of speech level using a nonsense logatome test. Average recognition scores were analyzed and compared to recognition scores of a speech recognition model. In contrast to commonly used spectral speech recognition models operating on long-term spectra, a "microscopic" model operating in the time domain was used. Variations of the model (accounting for hearing impairment) and different model parameters (reflecting cochlear compression) were tested. Using these model variations this study examined whether speech recognition performance in quiet is affected by changes in cochlear compression, namely, a linearization, which is often observed in HI listeners. Consonant recognition scores for HI listeners were poorer than for NH listeners. The model accurately predicted the speech reception thresholds of the NH and most HI listeners. A partial linearization of the cochlear compression in the auditory model, while keeping audibility constant, produced higher recognition scores and improved the prediction accuracy. However, including listener-specific information about the exact form of the cochlear compression did not improve the prediction further.
Collapse
Affiliation(s)
- Tim Jürgens
- Cluster of Excellence "Hearing4all," Department für Medizinische Physik und Akustik, Carl-von-Ossietzky Universität Oldenburg, Carl-von Ossietzky-Strasse 9-11, D-26111 Oldenburg, Germany
| | - Stephan D Ewert
- Cluster of Excellence "Hearing4all," Department für Medizinische Physik und Akustik, Carl-von-Ossietzky Universität Oldenburg, Carl-von Ossietzky-Strasse 9-11, D-26111 Oldenburg, Germany
| | - Birger Kollmeier
- Cluster of Excellence "Hearing4all," Department für Medizinische Physik und Akustik, Carl-von-Ossietzky Universität Oldenburg, Carl-von Ossietzky-Strasse 9-11, D-26111 Oldenburg, Germany
| | - Thomas Brand
- Cluster of Excellence "Hearing4all," Department für Medizinische Physik und Akustik, Carl-von-Ossietzky Universität Oldenburg, Carl-von Ossietzky-Strasse 9-11, D-26111 Oldenburg, Germany
| |
Collapse
|
43
|
Abstract
The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.
Collapse
Affiliation(s)
- Søren Jørgensen
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | | | | |
Collapse
|
44
|
Dietz M, Bernstein LR, Trahiotis C, Ewert SD, Hohmann V. The effect of overall level on sensitivity to interaural differences of time and level at high frequencies. J Acoust Soc Am 2013; 134:494-502. [PMID: 23862824 PMCID: PMC3724750 DOI: 10.1121/1.4807827] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
For high-frequency complex stimuli, detection thresholds for envelope-based interaural time differences (ITDs) decrease with overall level. Substantial heterogeneity is, however, evident among the findings concerning the rate at which thresholds decline with level. This study investigated factors affecting the influence of overall level on threshold ITDs. Thresholds were measured as a function of overall level for 4-kHz-centered "targets" in three experiments focusing, respectively, on stimulus-type (sinusoidally amplitude-modulated or "transposed" tones), modulation frequency, and details concerning low-pass noise used to mask low-frequency distortion products. Results indicated that (1) log-ITD thresholds decreased linearly with overall level; (2) slopes relating log-ITD thresholds to level did not depend significantly on stimulus type; (3) lower modulation frequencies produced greater dependencies of thresholds on overall level than did higher modulation frequencies; (4) the effect of overall level on threshold-ITDs was independent of the interaural configuration and levels of the low-pass noise maskers tested; (5) synchronously gating the low-pass noise and target produced a greater dependency of thresholds on the overall level of the target than did continuous or temporally "fringed" presentation of the noise. A fourth experiment showed that threshold interaural level differences were somewhat less affected by changes in overall level than were threshold ITDs.
Collapse
Affiliation(s)
- Mathias Dietz
- Medizinische Physik, Universität Oldenburg, 26111 Oldenburg, Germany.
| | | | | | | | | |
Collapse
|
45
|
Dietz M, Wendt T, Ewert SD, Laback B, Hohmann V. Comparing the effect of pause duration on threshold interaural time differences between exponential and squared-sine envelopes (L). J Acoust Soc Am 2013; 133:1-4. [PMID: 23297875 DOI: 10.1121/1.4768876] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Recently two studies [Klein-Hennig et al., J. Acoust. Soc. Am. 129, 3856-3872 (2011); Laback et al., J. Acoust. Soc. Am. 130, 1515-1529 (2011)] independently investigated the isolated effect of pause duration on sensitivity to interaural time differences (ITD) in the ongoing stimulus envelope. The steepness of the threshold ITD as a function of pause duration functions differed considerably across studies. The present study, using matched carrier and modulation frequencies, directly compared threshold ITDs for the two envelope flank shapes from those studies. The results agree well when defining the metric of pause duration based on modulation depth sensitivity.
Collapse
Affiliation(s)
- Mathias Dietz
- Medizinische Physik, Universität Oldenburg, Carl-von-Ossietzky Strasse 9-11, 26111 Oldenburg, Germany.
| | | | | | | | | |
Collapse
|
46
|
Abstract
The relative contributions of within-channel and across-channel processes to perceptual comodulation masking release (CMR) were investigated in the framework of an auditory processing model. A generalized version of the computational auditory signal processing and perception model [CASP; Jepsen et al., J. Acoust. Soc. Am. 124, 422-438 (2008)] was used and extended by an across-channel modulation processing stage according to Piechowiak et al. [J. Acoust. Soc. Am. 121, 2111-2126 (2007)]. Five experimental paradigms were considered: CMR with a broadband noise masker as a function of the masker spectrum level; CMR with four widely spaced flanking bands (FBs) varying in overall level; CMR with one FB varying in frequency and level relative to the on-frequency band (OFB); CMR with one FB varying in frequency; and CMR as a function of the number of FBs. The predictions suggest that at least three different mechanisms contribute to overall CMR in the considered conditions: (1) a within-channel process based on changes in the envelope characteristic due to the addition of the signal to the masker; (2) a within-channel process based on nonlinear peripheral processing of the OFB's envelope caused by the FB(s); and (3) an across-channel process that is robust across presentation levels but relatively small (2-5 dB).
Collapse
Affiliation(s)
- Torsten Dau
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | | | | |
Collapse
|
47
|
Abstract
Second-order amplitude modulation is a relatively slow variation of the modulation depth of a first-order amplitude modulation with higher frequency. In contrast to first-order modulation, which appears as a physical component in the stimulus spectrum after half-wave rectification, second-order modulation is not necessarily demodulated by the auditory periphery. For binaural processing of second-order amplitude modulated stimuli it is unknown whether interaural time differences (ITDs) in the second-order modulation result in a lateralized percept. Thus, second-order modulation can serve as a tool to investigate whether demodulation of interaurally delayed components is a prerequisite for lateralization. In most of the psychoacoustic experiments presented here, a 25 Hz sinusoidally amplitude-modulated (SAM) 160 Hz tone was either transposed to 4 kHz by half-wave rectifying this SAM waveform before multiplication with a 4 kHz tone (TSAM), or by adding an offset before multiplication (SAMAM). The experiments revealed an inability to lateralize the SAMAM based on ITDs in the 25 Hz component, whereas subjects could lateralize the TSAM. Given that only the TSAM results in a demodulated 25 Hz component after peripheral auditory processing, this result supports the hypothesis that demodulation is a prerequisite for lateralization, which has consequences for temporal modulation processing in models of binaural interaction.
Collapse
Affiliation(s)
- Mathias Dietz
- Medizinische Physik, Universität Oldenburg, 26111 Oldenburg, Germany.
| | | | | |
Collapse
|
48
|
Ewert SD, Kaiser K, Kernschmidt L, Wiegrebe L. Perceptual sensitivity to high-frequency interaural time differences created by rustling sounds. J Assoc Res Otolaryngol 2011; 13:131-43. [PMID: 22124890 DOI: 10.1007/s10162-011-0303-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 11/03/2011] [Indexed: 10/15/2022] Open
Abstract
Interaural time differences (ITDs) can be used to localize sounds in the horizontal plane. ITDs can be extracted from either the fine structure of low-frequency sounds or from the envelopes of high-frequency sounds. Studies of the latter have included stimuli with periodic envelopes like amplitude-modulated tones or transposed stimuli, and high-pass filtered Gaussian noises. Here, four experiments are presented investigating the perceptual relevance of ITD cues in synthetic and recorded "rustling" sounds. Both share the broad long-term power spectrum with Gaussian noise but provide more pronounced envelope fluctuations than Gaussian noise, quantified by an increased waveform fourth moment, W. The current data show that the JNDs in ITD for band-pass rustling sounds tended to improve with increasing W and with increasing bandwidth when the sounds were band limited. In contrast, no influence of W on JND was observed for broadband sounds, apparently because of listeners' sensitivity to ITD in low-frequency fine structure, present in the broadband sounds. Second, it is shown that for high-frequency rustling sounds ITD JNDs can be as low as 30 μs. The third result was that the amount of dominance for ITD extraction of low frequencies decreases systematically with increasing amount of envelope fluctuations. Finally, it is shown that despite the exceptionally good envelope ITD sensitivity evident with high-frequency rustling sounds, minimum audible angles of both synthetic and recorded high-frequency rustling sounds in virtual acoustic space are still best when the angular information is mediated by interaural level differences.
Collapse
Affiliation(s)
- Stephan D Ewert
- Medizinische Physik, Fakultät V, Universität Oldenburg, 26111, Oldenburg, Germany
| | | | | | | |
Collapse
|
49
|
Klein-Hennig M, Dietz M, Hohmann V, Ewert SD. The influence of different segments of the ongoing envelope on sensitivity to interaural time delays. J Acoust Soc Am 2011; 129:3856-72. [PMID: 21682409 DOI: 10.1121/1.3585847] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The auditory system is sensitive to interaural timing disparities in the fine structure and the envelope of sounds, each contributing important cues for lateralization. In this study, psychophysical measurements were conducted with customized envelope waveforms in order to investigate the isolated effect of different segments of a periodic, ongoing envelope on lateralization. One envelope cycle was composed of the four segments attack flank, hold duration, decay flank, and pause duration, which were independently varied to customize the envelope waveform. The envelope waveforms were applied to a 4-kHz sinusoidal carrier, and just noticeable envelope interaural time differences were measured in six normal hearing subjects. The results indicate that attack durations and pause durations prior to the attack are the most important stimulus characteristics for processing envelope timing disparities. The results were compared to predictions of three binaural lateralization models based on the normalized cross correlation coefficient. Two of the models included an additional stage to mimic neural adaptation prior to binaural interaction, involving either a single short time constant (5 ms) or a combination of five time constants up to 500 ms. It was shown that the model with the single short time constant accounted best for the data.
Collapse
|
50
|
Dietz M, Ewert SD, Hohmann V. Lateralization of stimuli with independent fine-structure and envelope-based temporal disparities. J Acoust Soc Am 2009; 125:1622-1635. [PMID: 19275320 DOI: 10.1121/1.3076045] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Psychoacoustic experiments were conducted to investigate the role and interaction of fine-structure and envelope-based interaural temporal disparities. A computational model for the lateralization of binaural stimuli, motivated by recent physiological findings, is suggested and evaluated against the psychoacoustic data. The model is based on the independent extraction of the interaural phase difference (IPD) from the stimulus fine-structure and envelope. Sinusoidally amplitude-modulated 1-kHz tones were used in the experiments. The lateralization from either carrier (fine-structure) or modulator (envelope) IPD was matched with an interaural level difference, revealing a nearly linear dependence for both IPD types up to 135 degrees , independent of the modulation frequency. However, if a carrier IPD was traded with an opposed modulator IPD to produce a centered sound image, a carrier IPD of 45 degrees required the largest opposed modulator IPD. The data could be modeled assuming a population of binaural neurons with a physiological distribution of the best IPDs clustered around 45 degrees -50 degrees . The model was also used to predict the perceived lateralization of previously published data. Subject-dependent differences in the perceptual salience of fine-structure and envelope cues, also reported previously, could be modeled by individual weighting coefficients for the two cues.
Collapse
Affiliation(s)
- Mathias Dietz
- Medizinische Physik, Universitat Oldenburg, Oldenburg, Germany.
| | | | | |
Collapse
|