26
|
Biberger T, Ewert SD. The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:1098. [PMID: 28863616 DOI: 10.1121/1.4999059] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023-1038], combining the "classical" concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524-540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436-446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.
Collapse
|
27
|
Ewert SD, Schubotz W, Brand T, Kollmeier B. Binaural masking release in symmetric listening conditions with spectro-temporally modulated maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:12. [PMID: 28764456 DOI: 10.1121/1.4990019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Speech reception thresholds (SRTs) decrease as target and maskers are spatially separated (spatial release from masking, SRM). The current study systematically assessed how SRTs and SRM for a frontal target in a spatially symmetric masker configuration depend on spectro-temporal masker properties, the availability of short-time interaural level difference (ILD) and interaural time difference (ITD), and informational masking. Maskers ranged from stationary noise to single, interfering talkers and were modified by head-related transfer functions to provide: (i) different binaural cues (ILD, ITD, or both) and (ii) independent maskers in each ear ("infinite ILD"). Additionally, a condition was tested in which only information from short-time spectro-temporal segments of the ear with a favorable signal-to-noise ratio (better-ear glimpses) was presented. For noise-based maskers, ILD, ITD, and spectral changes related to masker location contributed similarly to SRM, while ILD cues played a larger role if temporal modulation was introduced. For speech maskers, glimpsing and perceived location contributed roughly equally and ITD contributed less. The "infinite ILD" condition might suggest better-ear glimpsing limitations resulting in a maximal SRM of 12 dB for maskers with low or absent informational masking. Comparison to binaural model predictions highlighted the importance of short-time processing and helped to clarify the contribution of the different binaural cues and mechanisms.
Collapse
|
28
|
Kortlang S, Chen Z, Gerkmann T, Kollmeier B, Hohmann V, Ewert SD. Evaluation of combined dynamic compression and single channel noise reduction for hearing aid applications. Int J Audiol 2017; 57:S43-S54. [PMID: 28355947 DOI: 10.1080/14992027.2017.1300695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVE Single-channel noise reduction (SCNR) and dynamic range compression (DRC) are important elements in hearing aids. Only relatively few studies have addressed interaction effects and typically used real hearing aids with limited knowledge about the integrated algorithms. Here the potential benefit of different combinations and integration of SCNR and DRC was systematically assessed. DESIGN Ten different systems combining SCNR and DRC were implemented, including five serial arrangements, a parallel and two multiplicative approaches. In an instrumental evaluation, signal-to-noise ratio (SNR) improvement and spectral contrast enhancement (SCE) were assessed. Quality ratings at 0 and +6 dB SNR, and speech reception thresholds (SRTs) in noise were measured using stationary and babble noise. STUDY SAMPLE Thirteen young normal-hearing (NH) listeners and 12 hearing-impaired (HI) listeners participated. RESULTS In line with an increased segmental SNR and spectral contrast compared to a serial concatenation, the parallel approach significantly reduced the perceived noise annoyance for both subject groups. The proposed multiplicative approaches could partly counteract increased speech distortions introduced by DRC and achieved the best overall quality for the HI listeners. CONCLUSIONS For high SNRs well above the individual SRT, the specific combination of SCNR and DRC is perceptually relevant and the integrative approaches were preferred.
Collapse
|
29
|
Hu H, Ewert SD, McAlpine D, Dietz M. Differences in the temporal course of interaural time difference sensitivity between acoustic and electric hearing in amplitude modulated stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:1862. [PMID: 28372072 DOI: 10.1121/1.4977014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Previous studies have shown that normal-hearing (NH) listeners' spatial perception of non-stationary interaural time differences (ITDs) is dominated by the carrier ITD during rising amplitude segments. Here, ITD sensitivity throughout the amplitude-modulation cycle in NH listeners and bilateral cochlear implant (CI) subjects is compared, the latter by means of direct stimulation of a single electrode pair. The data indicate that, while NH listeners are most sensitive to ITDs applied toward the beginning of a modulation cycle at 600 Hz, NH listeners at 200 Hz and especially bilateral CI subjects at 200 pulses per second (pps) are more sensitive to ITDs applied to the modulation maximum. This has implications for spatial-hearing in complex environments: NH listeners' dominant 600-Hz ITD information from the rising amplitude segments comprises direct sound information. The 200-pps low rate required to get ITD sensitivity in CI users results in a higher weight of pulses later in the modulation cycle where the source ITDs are more likely corrupted by reflections. This indirectly indicates that even if future binaural CI processors are able to provide perceptually exploitable ITD information, CI users will likely not get the full benefit from such pulse-based ITD cues in reverberant and other complex environments.
Collapse
|
30
|
Wallaert N, Moore BCJ, Ewert SD, Lorenzi C. Sensorineural hearing loss enhances auditory sensitivity and temporal integration for amplitude modulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:971. [PMID: 28253641 DOI: 10.1121/1.4976080] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Amplitude-modulation detection thresholds (AMDTs) were measured at 40 dB sensation level for listeners with mild-to-moderate sensorineural hearing loss (age: 50-64 yr) for a carrier frequency of 500 Hz and rates of 2 and 20 Hz. The number of modulation cycles, N, varied between two and nine. The data were compared with AMDTs measured for young and older normal-hearing listeners [Wallaert, Moore, and Lorenzi (2016). J. Acoust. Soc. Am. 139, 3088-3096]. As for normal-hearing listeners, AMDTs were lower for the 2-Hz than for the 20-Hz rate, and AMDTs decreased with increasing N. AMDTs were lower for hearing-impaired listeners than for normal-hearing listeners, and the effect of increasing N was greater for hearing-impaired listeners. A computational model based on the modulation-filterbank concept and a template-matching decision strategy was developed to account for the data. The psychophysical and simulation data suggest that the loss of amplitude compression in the impaired cochlea is mainly responsible for the enhanced sensitivity and temporal integration of temporal envelope cues found for hearing-impaired listeners. The data also suggest that, for AM detection, cochlear damage is associated with increased internal noise, but preserved short-term memory and decision mechanisms.
Collapse
|
31
|
Biberger T, Ewert SD. Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1023. [PMID: 27586734 DOI: 10.1121/1.4960574] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123-177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181-1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Collapse
|
32
|
Schubotz W, Brand T, Kollmeier B, Ewert SD. Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:524. [PMID: 27475175 DOI: 10.1121/1.4955079] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models.
Collapse
|
33
|
Paraouty N, Ewert SD, Wallaert N, Lorenzi C. Interactions between amplitude modulation and frequency modulation processing: Effects of age and hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:121. [PMID: 27475138 DOI: 10.1121/1.4955078] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Frequency modulation (FM) and amplitude modulation (AM) detection thresholds were measured for a 500-Hz carrier frequency and a 5-Hz modulation rate. For AM detection, FM at the same rate as the AM was superimposed with varying FM depth. For FM detection, AM at the same rate was superimposed with varying AM depth. The target stimuli always contained both amplitude and frequency modulations, while the standard stimuli only contained the interfering modulation. Young and older normal-hearing listeners, as well as older listeners with mild-to-moderate sensorineural hearing loss were tested. For all groups, AM and FM detection thresholds were degraded in the presence of the interfering modulation. AM detection with and without interfering FM was hardly affected by either age or hearing loss. While aging had an overall detrimental effect on FM detection with and without interfering AM, there was a trend that hearing loss further impaired FM detection in the presence of AM. Several models using optimal combination of temporal-envelope cues at the outputs of off-frequency filters were tested. The interfering effects could only be predicted for hearing-impaired listeners. This indirectly supports the idea that, in addition to envelope cues resulting from FM-to-AM conversion, normal-hearing listeners use temporal fine-structure cues for FM detection.
Collapse
|
34
|
Pieper I, Mauermann M, Kollmeier B, Ewert SD. Physiological motivated transmission-lines as front end for loudness models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:2896. [PMID: 27250182 DOI: 10.1121/1.4949540] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The perception of loudness is strongly influenced by peripheral auditory processing, which calls for a physiologically correct peripheral auditory processing stage when constructing advanced loudness models. Most loudness models, however, rather follow a functional approach: a parallel auditory filter bank combined with a compression stage, followed by spectral and temporal integration. Such classical loudness models do not allow to directly link physiological measurements like otoacoustic emissions to properties of their auditory filterbank. However, this can be achieved with physiologically motivated transmission-line models (TLMs) of the cochlea. Here two active and nonlinear TLMs were tested as the peripheral front end of a loudness model. The TLMs are followed by a simple generic back end which performs integration of basilar-membrane "excitation" across place and time to yield a loudness estimate. The proposed model approach reaches similar performance as other state-of-the-art loudness models regarding the prediction of loudness in sones, equal-loudness contours (including spectral fine structure), and loudness as a function of bandwidth. The suggested model provides a powerful tool to directly connect objective measures of basilar membrane compression, such as distortion product otoacoustic emissions, and loudness in future studies.
Collapse
|
35
|
Schädler MR, Warzybok A, Ewert SD, Kollmeier B. A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:2708. [PMID: 27250164 DOI: 10.1121/1.4948772] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A framework for simulating auditory discrimination experiments, based on an approach from Schädler, Warzybok, Hochmuth, and Kollmeier [(2015). Int. J. Audiol. 54, 100-107] which was originally designed to predict speech recognition thresholds, is extended to also predict psychoacoustic thresholds. The proposed framework is used to assess the suitability of different auditory-inspired feature sets for a range of auditory discrimination experiments that included psychoacoustic as well as speech recognition experiments in noise. The considered experiments were 2 kHz tone-in-broadband-noise simultaneous masking depending on the tone length, spectral masking with simultaneously presented tone signals and narrow-band noise maskers, and German Matrix sentence test reception threshold in stationary and modulated noise. The employed feature sets included spectro-temporal Gabor filter bank features, Mel-frequency cepstral coefficients, logarithmically scaled Mel-spectrograms, and the internal representation of the Perception Model from Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102(5), 2892-2905]. The proposed framework was successfully employed to simulate all experiments with a common parameter set and obtain objective thresholds with less assumptions compared to traditional modeling approaches. Depending on the feature set, the simulated reference-free thresholds were found to agree with-and hence to predict-empirical data from the literature. Across-frequency processing was found to be crucial to accurately model the lower speech reception threshold in modulated noise conditions than in stationary noise conditions.
Collapse
|
36
|
Oetting D, Hohmann V, Appell JE, Kollmeier B, Ewert SD. Spectral and binaural loudness summation for hearing-impaired listeners. Hear Res 2016; 335:179-192. [PMID: 27006003 DOI: 10.1016/j.heares.2016.03.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 03/15/2016] [Accepted: 03/17/2016] [Indexed: 11/30/2022]
Abstract
Sensorineural hearing loss typically results in a steepened loudness function and a reduced dynamic range from elevated thresholds to uncomfortably loud levels for narrowband and broadband signals. Restoring narrowband loudness perception for hearing-impaired (HI) listeners can lead to overly loud perception of broadband signals and it is unclear how binaural presentation affects loudness perception in this case. Here, loudness perception quantified by categorical loudness scaling for nine normal-hearing (NH) and ten HI listeners was compared for signals with different bandwidth and different spectral shape in monaural and in binaural conditions. For the HI listeners, frequency- and level-dependent amplification was used to match the narrowband monaural loudness functions of the NH listeners. The average loudness functions for NH and HI listeners showed good agreement for monaural broadband signals. However, HI listeners showed substantially greater loudness for binaural broadband signals than NH listeners: on average a 14.1 dB lower level was required to reach "very loud" (range 30.8 to -3.7 dB). Overall, with narrowband loudness compensation, a given binaural loudness for broadband signals above "medium loud" was reached at systematically lower levels for HI than for NH listeners. Such increased binaural loudness summation was not found for loudness categories below "medium loud" or for narrowband signals. Large individual variations in the increased loudness summation were observed and could not be explained by the audiogram or the narrowband loudness functions.
Collapse
|
37
|
Schubotz W, Brand T, Kollmeier B, Ewert SD. The Influence of High-Frequency Envelope Information on Low-Frequency Vowel Identification in Noise. PLoS One 2016; 11:e0145610. [PMID: 26730702 PMCID: PMC4701218 DOI: 10.1371/journal.pone.0145610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 12/07/2015] [Indexed: 11/19/2022] Open
Abstract
Vowel identification in noise using consonant-vowel-consonant (CVC) logatomes was used to investigate a possible interplay of speech information from different frequency regions. It was hypothesized that the periodicity conveyed by the temporal envelope of a high frequency stimulus can enhance the use of the information carried by auditory channels in the low-frequency region that share the same periodicity. It was further hypothesized that this acts as a strobe-like mechanism and would increase the signal-to-noise ratio for the voiced parts of the CVCs. In a first experiment, different high-frequency cues were provided to test this hypothesis, whereas a second experiment examined more closely the role of amplitude modulations and intact phase information within the high-frequency region (4–8 kHz). CVCs were either natural or vocoded speech (both limited to a low-pass cutoff-frequency of 2.5 kHz) and were presented in stationary 3-kHz low-pass filtered masking noise. The experimental results did not support the hypothesized use of periodicity information for aiding low-frequency perception.
Collapse
|
38
|
Kortlang S, Mauermann M, Ewert SD. Suprathreshold auditory processing deficits in noise: Effects of hearing loss and age. Hear Res 2016; 331:27-40. [DOI: 10.1016/j.heares.2015.10.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 10/05/2015] [Accepted: 10/07/2015] [Indexed: 11/15/2022]
|
39
|
Hu H, Lutman ME, Ewert SD, Li G, Bleeck S. Sparse Nonnegative Matrix Factorization Strategy for Cochlear Implants. Trends Hear 2015; 19:19/0/2331216515616941. [PMID: 26721919 PMCID: PMC4771045 DOI: 10.1177/2331216515616941] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Current cochlear implant (CI) strategies carry speech information via the waveform envelope in frequency subbands. CIs require efficient speech processing to maximize information transfer to the brain, especially in background noise, where the speech envelope is not robust to noise interference. In such conditions, the envelope, after decomposition into frequency bands, may be enhanced by sparse transformations, such as nonnegative matrix factorization (NMF). Here, a novel CI processing algorithm is described, which works by applying NMF to the envelope matrix (envelopogram) of 22 frequency channels in order to improve performance in noisy environments. It is evaluated for speech in eight-talker babble noise. The critical sparsity constraint parameter was first tuned using objective measures and then evaluated with subjective speech perception experiments for both normal hearing and CI subjects. Results from vocoder simulations with 10 normal hearing subjects showed that the algorithm significantly enhances speech intelligibility with the selected sparsity constraints. Results from eight CI subjects showed no significant overall improvement compared with the standard advanced combination encoder algorithm, but a trend toward improvement of word identification of about 10 percentage points at +15 dB signal-to-noise ratio (SNR) was observed in the eight CI subjects. Additionally, a considerable reduction of the spread of speech perception performance from 40% to 93% for advanced combination encoder to 80% to 100% for the suggested NMF coding strategy was observed.
Collapse
|
40
|
Kayser H, Hohmann V, Ewert SD, Kollmeier B, Anemüller J. Robust auditory localization using probabilistic inference and coherence-based weighting of interaural cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:2635-2648. [PMID: 26627742 DOI: 10.1121/1.4932588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Robust sound source localization is performed by the human auditory system even in challenging acoustic conditions and in previously unencountered, complex scenarios. Here a computational binaural localization model is proposed that possesses mechanisms for handling of corrupted or unreliable localization cues and generalization across different acoustic situations. Central to the model is the use of interaural coherence, measured as interaural vector strength (IVS), to dynamically weight the importance of observed interaural phase (IPD) and level (ILD) differences in frequency bands up to 1.4 kHz. This is accomplished through formulation of a probabilistic model in which the ILD and IPD distributions pertaining to a specific source location are dependent on observed interaural coherence. Bayesian computation of the direction-of-arrival probability map naturally leads to coherence-weighted integration of location cues across frequency and time. Results confirm the model's validity through statistical analyses of interaural parameter values. Simulated localization experiments show that even data points with low reliability (i.e., low IVS) can be exploited to enhance localization performance. A temporal integration length of at least 200 ms is required to gain a benefit; this is in accordance with previous psychoacoustic findings on temporal integration of spatial cues in the human auditory system.
Collapse
|
41
|
Oetting D, Brand T, Ewert SD. Optimized loudness-function estimation for categorical loudness scaling data. Hear Res 2014; 316:16-27. [DOI: 10.1016/j.heares.2014.07.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 07/03/2014] [Accepted: 07/09/2014] [Indexed: 11/25/2022]
|
42
|
Jürgens T, Ewert SD, Kollmeier B, Brand T. Prediction of consonant recognition in quiet for listeners with normal and impaired hearing using an auditory model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:1506-1517. [PMID: 24606286 DOI: 10.1121/1.4864293] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Consonant recognition was assessed in normal-hearing (NH) and hearing-impaired (HI) listeners in quiet as a function of speech level using a nonsense logatome test. Average recognition scores were analyzed and compared to recognition scores of a speech recognition model. In contrast to commonly used spectral speech recognition models operating on long-term spectra, a "microscopic" model operating in the time domain was used. Variations of the model (accounting for hearing impairment) and different model parameters (reflecting cochlear compression) were tested. Using these model variations this study examined whether speech recognition performance in quiet is affected by changes in cochlear compression, namely, a linearization, which is often observed in HI listeners. Consonant recognition scores for HI listeners were poorer than for NH listeners. The model accurately predicted the speech reception thresholds of the NH and most HI listeners. A partial linearization of the cochlear compression in the auditory model, while keeping audibility constant, produced higher recognition scores and improved the prediction accuracy. However, including listener-specific information about the exact form of the cochlear compression did not improve the prediction further.
Collapse
|
43
|
Jørgensen S, Ewert SD, Dau T. A multi-resolution envelope-power based model for speech intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:436-446. [PMID: 23862819 DOI: 10.1121/1.4807563] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.
Collapse
|
44
|
Dietz M, Bernstein LR, Trahiotis C, Ewert SD, Hohmann V. The effect of overall level on sensitivity to interaural differences of time and level at high frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:494-502. [PMID: 23862824 PMCID: PMC3724750 DOI: 10.1121/1.4807827] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
For high-frequency complex stimuli, detection thresholds for envelope-based interaural time differences (ITDs) decrease with overall level. Substantial heterogeneity is, however, evident among the findings concerning the rate at which thresholds decline with level. This study investigated factors affecting the influence of overall level on threshold ITDs. Thresholds were measured as a function of overall level for 4-kHz-centered "targets" in three experiments focusing, respectively, on stimulus-type (sinusoidally amplitude-modulated or "transposed" tones), modulation frequency, and details concerning low-pass noise used to mask low-frequency distortion products. Results indicated that (1) log-ITD thresholds decreased linearly with overall level; (2) slopes relating log-ITD thresholds to level did not depend significantly on stimulus type; (3) lower modulation frequencies produced greater dependencies of thresholds on overall level than did higher modulation frequencies; (4) the effect of overall level on threshold-ITDs was independent of the interaural configuration and levels of the low-pass noise maskers tested; (5) synchronously gating the low-pass noise and target produced a greater dependency of thresholds on the overall level of the target than did continuous or temporally "fringed" presentation of the noise. A fourth experiment showed that threshold interaural level differences were somewhat less affected by changes in overall level than were threshold ITDs.
Collapse
|
45
|
Dietz M, Wendt T, Ewert SD, Laback B, Hohmann V. Comparing the effect of pause duration on threshold interaural time differences between exponential and squared-sine envelopes (L). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:1-4. [PMID: 23297875 DOI: 10.1121/1.4768876] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Recently two studies [Klein-Hennig et al., J. Acoust. Soc. Am. 129, 3856-3872 (2011); Laback et al., J. Acoust. Soc. Am. 130, 1515-1529 (2011)] independently investigated the isolated effect of pause duration on sensitivity to interaural time differences (ITD) in the ongoing stimulus envelope. The steepness of the threshold ITD as a function of pause duration functions differed considerably across studies. The present study, using matched carrier and modulation frequencies, directly compared threshold ITDs for the two envelope flank shapes from those studies. The results agree well when defining the metric of pause duration based on modulation depth sensitivity.
Collapse
|
46
|
Dau T, Piechowiak T, Ewert SD. Modeling within- and across-channel processes in comodulation masking release. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:350-364. [PMID: 23297908 DOI: 10.1121/1.4768882] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The relative contributions of within-channel and across-channel processes to perceptual comodulation masking release (CMR) were investigated in the framework of an auditory processing model. A generalized version of the computational auditory signal processing and perception model [CASP; Jepsen et al., J. Acoust. Soc. Am. 124, 422-438 (2008)] was used and extended by an across-channel modulation processing stage according to Piechowiak et al. [J. Acoust. Soc. Am. 121, 2111-2126 (2007)]. Five experimental paradigms were considered: CMR with a broadband noise masker as a function of the masker spectrum level; CMR with four widely spaced flanking bands (FBs) varying in overall level; CMR with one FB varying in frequency and level relative to the on-frequency band (OFB); CMR with one FB varying in frequency; and CMR as a function of the number of FBs. The predictions suggest that at least three different mechanisms contribute to overall CMR in the considered conditions: (1) a within-channel process based on changes in the envelope characteristic due to the addition of the signal to the masker; (2) a within-channel process based on nonlinear peripheral processing of the OFB's envelope caused by the FB(s); and (3) an across-channel process that is robust across presentation levels but relatively small (2-5 dB).
Collapse
|
47
|
Dietz M, Ewert SD, Hohmann V. Lateralization based on interaural differences in the second-order amplitude modulator. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:398-408. [PMID: 22280601 DOI: 10.1121/1.3662078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Second-order amplitude modulation is a relatively slow variation of the modulation depth of a first-order amplitude modulation with higher frequency. In contrast to first-order modulation, which appears as a physical component in the stimulus spectrum after half-wave rectification, second-order modulation is not necessarily demodulated by the auditory periphery. For binaural processing of second-order amplitude modulated stimuli it is unknown whether interaural time differences (ITDs) in the second-order modulation result in a lateralized percept. Thus, second-order modulation can serve as a tool to investigate whether demodulation of interaurally delayed components is a prerequisite for lateralization. In most of the psychoacoustic experiments presented here, a 25 Hz sinusoidally amplitude-modulated (SAM) 160 Hz tone was either transposed to 4 kHz by half-wave rectifying this SAM waveform before multiplication with a 4 kHz tone (TSAM), or by adding an offset before multiplication (SAMAM). The experiments revealed an inability to lateralize the SAMAM based on ITDs in the 25 Hz component, whereas subjects could lateralize the TSAM. Given that only the TSAM results in a demodulated 25 Hz component after peripheral auditory processing, this result supports the hypothesis that demodulation is a prerequisite for lateralization, which has consequences for temporal modulation processing in models of binaural interaction.
Collapse
|
48
|
Ewert SD, Kaiser K, Kernschmidt L, Wiegrebe L. Perceptual sensitivity to high-frequency interaural time differences created by rustling sounds. J Assoc Res Otolaryngol 2011; 13:131-43. [PMID: 22124890 DOI: 10.1007/s10162-011-0303-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 11/03/2011] [Indexed: 10/15/2022] Open
Abstract
Interaural time differences (ITDs) can be used to localize sounds in the horizontal plane. ITDs can be extracted from either the fine structure of low-frequency sounds or from the envelopes of high-frequency sounds. Studies of the latter have included stimuli with periodic envelopes like amplitude-modulated tones or transposed stimuli, and high-pass filtered Gaussian noises. Here, four experiments are presented investigating the perceptual relevance of ITD cues in synthetic and recorded "rustling" sounds. Both share the broad long-term power spectrum with Gaussian noise but provide more pronounced envelope fluctuations than Gaussian noise, quantified by an increased waveform fourth moment, W. The current data show that the JNDs in ITD for band-pass rustling sounds tended to improve with increasing W and with increasing bandwidth when the sounds were band limited. In contrast, no influence of W on JND was observed for broadband sounds, apparently because of listeners' sensitivity to ITD in low-frequency fine structure, present in the broadband sounds. Second, it is shown that for high-frequency rustling sounds ITD JNDs can be as low as 30 μs. The third result was that the amount of dominance for ITD extraction of low frequencies decreases systematically with increasing amount of envelope fluctuations. Finally, it is shown that despite the exceptionally good envelope ITD sensitivity evident with high-frequency rustling sounds, minimum audible angles of both synthetic and recorded high-frequency rustling sounds in virtual acoustic space are still best when the angular information is mediated by interaural level differences.
Collapse
|
49
|
Klein-Hennig M, Dietz M, Hohmann V, Ewert SD. The influence of different segments of the ongoing envelope on sensitivity to interaural time delays. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:3856-72. [PMID: 21682409 DOI: 10.1121/1.3585847] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The auditory system is sensitive to interaural timing disparities in the fine structure and the envelope of sounds, each contributing important cues for lateralization. In this study, psychophysical measurements were conducted with customized envelope waveforms in order to investigate the isolated effect of different segments of a periodic, ongoing envelope on lateralization. One envelope cycle was composed of the four segments attack flank, hold duration, decay flank, and pause duration, which were independently varied to customize the envelope waveform. The envelope waveforms were applied to a 4-kHz sinusoidal carrier, and just noticeable envelope interaural time differences were measured in six normal hearing subjects. The results indicate that attack durations and pause durations prior to the attack are the most important stimulus characteristics for processing envelope timing disparities. The results were compared to predictions of three binaural lateralization models based on the normalized cross correlation coefficient. Two of the models included an additional stage to mimic neural adaptation prior to binaural interaction, involving either a single short time constant (5 ms) or a combination of five time constants up to 500 ms. It was shown that the model with the single short time constant accounted best for the data.
Collapse
|
50
|
Dietz M, Ewert SD, Hohmann V. Lateralization of stimuli with independent fine-structure and envelope-based temporal disparities. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1622-1635. [PMID: 19275320 DOI: 10.1121/1.3076045] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Psychoacoustic experiments were conducted to investigate the role and interaction of fine-structure and envelope-based interaural temporal disparities. A computational model for the lateralization of binaural stimuli, motivated by recent physiological findings, is suggested and evaluated against the psychoacoustic data. The model is based on the independent extraction of the interaural phase difference (IPD) from the stimulus fine-structure and envelope. Sinusoidally amplitude-modulated 1-kHz tones were used in the experiments. The lateralization from either carrier (fine-structure) or modulator (envelope) IPD was matched with an interaural level difference, revealing a nearly linear dependence for both IPD types up to 135 degrees , independent of the modulation frequency. However, if a carrier IPD was traded with an opposed modulator IPD to produce a centered sound image, a carrier IPD of 45 degrees required the largest opposed modulator IPD. The data could be modeled assuming a population of binaural neurons with a physiological distribution of the best IPDs clustered around 45 degrees -50 degrees . The model was also used to predict the perceived lateralization of previously published data. Subject-dependent differences in the perceptual salience of fine-structure and envelope cues, also reported previously, could be modeled by individual weighting coefficients for the two cues.
Collapse
|