1
|
Culling JF, D'Olne EFC, Davies BD, Powell N, Naylor PA. Practical utility of a head-mounted gaze-directed beamforming system. J Acoust Soc Am 2023; 154:3760-3768. [PMID: 38099830 DOI: 10.1121/10.0023961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023]
Abstract
Assistive auditory devices that enhance signal-to-noise ratio must follow the user's changing attention; errors could lead to the desired source being suppressed as noise. A method for measuring the practical benefit of attention-following speech enhancement is described and used to show a benefit for gaze-directed beamforming over natural binaural hearing. First, participants watched a recorded video conference call between two people with six additional interfering voices in different directions. The directions of the target voices corresponded to the spatial layout of their video streams. A simulated beamformer was yoked to the participant's gaze direction using an eye tracker. For the control condition, all eight voices were spatially distributed in a simulation of unaided binaural hearing. Participants completed questionnaires on the content of the conversation, scoring twice as high in the questionnaires for the beamforming condition. Sentence-by-sentence intelligibility was then measured using new participants who viewed the same audiovisual stimulus for each isolated sentence. Participants recognized twice as many words in the beamforming condition. The results demonstrate the potential practical benefit of gaze-directed beamforming for hearing aids and illustrate how detailed intelligibility data can be retrieved from an experiment that involves behavioral engagement in an ongoing listening task.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF10 3AT, United Kingdom
| | - Emilie F C D'Olne
- Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - Bryn D Davies
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF10 3AT, United Kingdom
| | - Niamh Powell
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF10 3AT, United Kingdom
| | - Patrick A Naylor
- Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
2
|
Hadley LV, Culling JF. Timing of head turns to upcoming talkers in triadic conversation: Evidence for prediction of turn ends and interruptions. Front Psychol 2022; 13:1061582. [PMID: 36605274 PMCID: PMC9807761 DOI: 10.3389/fpsyg.2022.1061582] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/24/2022] [Indexed: 12/24/2022] Open
Abstract
In conversation, people are able to listen to an utterance and respond within only a few hundred milliseconds. It takes substantially longer to prepare even a simple utterance, suggesting that interlocutors may make use of predictions about when the talker is about to end. But it is not only the upcoming talker that needs to anticipate the prior talker ending-listeners that are simply following the conversation could also benefit from predicting the turn end in order to shift attention appropriately with the turn switch. In this paper, we examined whether people predict upcoming turn ends when watching conversational turns switch between others by analysing natural conversations. These conversations were between triads of older adults in different levels and types of noise. The analysis focused on the observer during turn switches between the other two parties using head orientation (i.e. saccades from one talker to the next) to identify when their focus moved from one talker to the next. For non-overlapping utterances, observers started to turn to the upcoming talker before the prior talker had finished speaking in 17% of turn switches (going up to 26% when accounting for motor-planning time). For overlapping utterances, observers started to turn towards the interrupter before they interrupted in 18% of turn switches (going up to 33% when accounting for motor-planning time). The timing of head turns was more precise at lower than higher noise levels, and was not affected by noise type. These findings demonstrate that listeners in natural group conversation situations often exhibit head movements that anticipate the end of one conversational turn and the beginning of another. Furthermore, this work demonstrates the value of analysing head movement as a cue to social attention, which could be relevant for advancing communication technology such as hearing devices.
Collapse
Affiliation(s)
- Lauren V. Hadley
- Hearing Sciences – Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| | - John F. Culling
- School of Psychology, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
3
|
Stevenson-Hoare JO, Freeman TCA, Culling JF. The pinna enhances angular discrimination in the frontal hemifield. J Acoust Soc Am 2022; 152:2140. [PMID: 36319254 DOI: 10.1121/10.0014599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
Human sound localization in the horizontal dimension is thought to be dominated by binaural cues, particularly interaural time delays, because monaural localization in this dimension is relatively poor. Remaining ambiguities of front versus back and up versus down are distinguished by high-frequency spectral cues generated by the pinna. The experiments in this study show that this account is incomplete. Using binaural listening throughout, the pinna substantially enhanced horizontal discrimination in the frontal hemifield, making discrimination in front better than discrimination at the rear, particularly for directions away from the median plane. Eliminating acoustic effects of the pinna by acoustically bypassing them or low-pass filtering abolished the advantage at the front without affecting the rear. Acoustic measurements revealed a pinna-induced spectral prominence that shifts smoothly in frequency as sounds move from 0° to 90° azimuth. The improved performance is discussed in terms of the monaural and binaural changes induced by the pinna.
Collapse
Affiliation(s)
- Joshua O Stevenson-Hoare
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| | - Tom C A Freeman
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| |
Collapse
|
4
|
Mcleod RWJ, Gallagher M, Hall A, Bant SP, Culling JF. Acoustic analysis of the effect of personal protective equipment on speech understanding: lessons for clinical environments. Int J Audiol 2022:1-6. [DOI: 10.1080/14992027.2022.2070780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
| | | | - Andy Hall
- ENT Department, University Hospital of Wales, Cardiff, UK
| | - Sarah P. Bant
- Audiology Department, Betsi Cadwaladr University Health Board, Bangor, UK
| | | |
Collapse
|
5
|
Graetzer S, Akeroyd MA, Barker J, Cox TJ, Culling JF, Naylor G, Porter E, Viveros-Muñoz R. Dataset of British English speech recordings for psychoacoustics and speech processing research: The clarity speech corpus. Data Brief 2022; 41:107951. [PMID: 35242933 PMCID: PMC8881678 DOI: 10.1016/j.dib.2022.107951] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 02/07/2022] [Accepted: 02/08/2022] [Indexed: 11/24/2022] Open
Abstract
This paper presents the Clarity Speech Corpus, a publicly available, forty speaker British English speech dataset. The corpus was created for the purpose of running listening tests to gauge speech intelligibility and quality in the Clarity Project, which has the goal of advancing speech signal processing by hearing aids through a series of challenges. The dataset is suitable for machine learning and other uses in speech and hearing technology, acoustics and psychoacoustics. The data comprises recordings of approximately 10,000 sentences drawn from the British National Corpus (BNC) with suitable length, words and grammatical construction for speech intelligibility testing. The collection process involved the selection of a subset of BNC sentences, the recording of these produced by 40 British English speakers, and the processing of these recordings to create individual sentence recordings with associated transcripts and metadata.
Collapse
Affiliation(s)
- Simone Graetzer
- Acoustics Research Centre, University of Salford, United Kingdom
| | - Michael A Akeroyd
- Hearing Sciences, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, United Kingdom
| | - Jon Barker
- Department of Computer Science, University of Sheffield, United Kingdom
| | - Trevor J Cox
- Acoustics Research Centre, University of Salford, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, United Kingdom
| | - Graham Naylor
- Hearing Sciences - Scottish Section, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, United Kingdom
| | - Eszter Porter
- Hearing Sciences, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, United Kingdom
| | | |
Collapse
|
6
|
Mcleod RWJ, Culling JF. Unilateral crosstalk cancellation in normal hearing participants using bilateral bone transducers. J Acoust Soc Am 2020; 148:63. [PMID: 32752776 DOI: 10.1121/10.0001529] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/22/2020] [Indexed: 06/11/2023]
Abstract
It is possible to psychophysically measure the phase and level of bone conducted sound at the cochleae using two bone transducers (BTs) [Mcleod and Culling (2019). J. Acoust Soc. Am. 146, 3295 - 3301]. The present work uses such measurements to improve masked thresholds by using the phase and level values to create a unilateral crosstalk cancellation system. To avoid changes in the coupling of the BT to the head, testing of tone and speech reception thresholds with and without crosstalk cancellation had to be performed immediately following the measurements without adjustment of the BT. To achieve this, a faster measurement method was created. Previously measured phase and level results were interpolated to predict likely results for new test frequencies. Testing time to collect the necessary phase and level values was reduced to approximately 15 min by exploiting listeners' previous measurements. The inter-cochlear phase difference and inter-cochlear level difference were consistent between experimental sittings in the same participant but different between participants. Addition of a crosstalk cancellation signal improved tone and speech reception thresholds for tones/speech presented with one BT and noise presented on the other by an average of 12.1 dB for tones and 13.67 dB for speech.
Collapse
Affiliation(s)
- Robert W J Mcleod
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| |
Collapse
|
7
|
Mcleod RWJ, Culling JF. Psychoacoustic measurement of phase and level for cross-talk cancellation using bilateral bone transducers: Comparison of methods. J Acoust Soc Am 2019; 146:3295. [PMID: 31795671 DOI: 10.1121/1.5131650] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 10/11/2019] [Indexed: 06/10/2023]
Abstract
Two bone-conduction hearing aids (BCHAs) could deliver improved stereo separation using cross-talk cancellation. Sound vibrations from each BCHA would be cancelled at the contralateral cochlea by an out-of-phase signal of the same level from the ipsilateral BCHA. A method to measure the level and phase required for these cancellation signals was developed and cross-validated with an established technique that combines air- and bone-conducted sound. Three participants with normal hearing wore bone transducers (BTs) on each mastoid and insert earphones. Both BTs produced a pure tone and the level and phase were adjusted in the right BT in order to cancel all perceived sound at that ear. To cross-validate, one BT was stimulated with a pure tone and participants cancelled the resultant signal at both cochleae via adjustment of the phase and level of signals from the earphones. Participants achieved cancellation using both methods between 1.5 and 8 kHz. Levels measured with each method differed by <1 dB between 3 and 5 kHz. The phase results also corresponded well for the cancelled ear (11° mean difference) but poorly for the contralateral ear (38.4° mean difference). The first method is transferable to patients with middle-ear dysfunction, but covers a limited frequency range.
Collapse
Affiliation(s)
- Robert W J Mcleod
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| |
Collapse
|
8
|
Grange JA, Culling JF, Bardsley B, Mackinney LI, Hughes SE, Backhouse SS. Turn an Ear to Hear: How Hearing-Impaired Listeners Can Exploit Head Orientation to Enhance Their Speech Intelligibility in Noisy Social Settings. Trends Hear 2019; 22:2331216518802701. [PMID: 30334495 PMCID: PMC6196611 DOI: 10.1177/2331216518802701] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Turning an ear toward the talker can enhance spatial release from masking. Here, with their head free, listeners attended to speech at a gradually diminishing signal-to-noise ratio and with the noise source azimuthally separated from the speech source by 180° or 90°. Young normal-hearing adult listeners spontaneously turned an ear toward the speech source in 64% of audio-only trials, but a visible talker’s face or cochlear implant (CI) use significantly reduced this head-turn behavior. All listener groups made more head movements once instructed to explore the potential benefit of head turns and followed the speech to lower signal-to-noise ratios. Unilateral CI users improved the most. In a virtual restaurant simulation with nine interfering noises or voices, hearing-impaired listeners and simulated bilateral CI users typically obtained a 1 to 3 dB head-orientation benefit from a 30° head turn away from the talker. In diffuse interference environments, the advice to U.K. CI users from many CI professionals and the communication guidance available on the Internet most often advise the CI user to face the talker head on. However, CI users would benefit from guidelines that recommend they look sidelong at the talker with their better hearing or implanted ear oriented toward the talker.
Collapse
Affiliation(s)
- Jacques A. Grange
- School of Psychology, Cardiff University, UK
- Jacques A. Grange, School of Psychology, Cardiff University, 70 Parc Place, Cardiff CF103AT, UK.
| | | | | | | | - Sarah E. Hughes
- South Wales Cochlear Implant Programme, Princess of Wales Hospital, Bridgend, UK
| | - Steven S. Backhouse
- South Wales Cochlear Implant Programme, Princess of Wales Hospital, Bridgend, UK
| |
Collapse
|
9
|
|
10
|
Abstract
The number of marketed bone-conduction hearing implants (BCHIs) has been steadily growing, with multiple percutaneous devices and transcutaneous devices now available. However, studies assessing efficacy often have small sample sizes and employ different assessment methodologies. Thus, there is a paucity of evidence to guide clinicians to the most appropriate device for each patient. This paper outlines audiological guidelines for the latest devices, as well as research from the most up-to-date clinical trials. We also outline the evidence base for some potentially contentious issues in the field of bone conduction, including bilateral fitting of BCHIs in those with bilateral conductive hearing loss as well as the use of BCHIs in single-sided deafness (SSD). Bilateral fitting of BCHIs have been found to significantly increase the hearing thresholds in quiet and improve sound localization, but to give limited benefits in background noise. Studies conducted via multiple assessment questionnaires have found strong evidence of subjective benefits for the use of BCHIs in SSD. However, there is little objective evidence of benefit for SSD patients from sound localization and speech in noise tests.
Collapse
|
11
|
Grange JA, Culling JF, Harris NSL, Bergfeld S. Cochlear implant simulator with independent representation of the full spiral ganglion. J Acoust Soc Am 2017; 142:EL484. [PMID: 29195445 DOI: 10.1121/1.5009602] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
In cochlear implant simulation with vocoders, narrow-band carriers deliver the envelopes from each analysis band to the cochlear positions of the simulated electrodes. However, this approach does not faithfully represent the continuous nature of the spiral ganglion. The proposed "SPIRAL" vocoder simulates current spread by mixing all envelopes across many tonal carriers. SPIRAL demonstrated that the classic finding of reduced speech-intelligibility benefit with additional electrodes could be due to current spread. SPIRAL produced lower speech reception thresholds than an equivalent noise vocoder. These thresholds are stable for between 20 and 160 carriers.
Collapse
Affiliation(s)
- Jacques A Grange
- School of Psychology, Cardiff University, CF103AT, Cardiff, United Kingdom , ,
| | - John F Culling
- School of Psychology, Cardiff University, CF103AT, Cardiff, United Kingdom , ,
| | - Naomi S L Harris
- School of Psychology, Cardiff University, CF103AT, Cardiff, United Kingdom , ,
| | - Sven Bergfeld
- Department of Cognitive Neuroscience, Bielefeld University, 33615 Bielefeld, Germany
| |
Collapse
|
12
|
Mcleod RWJ, Culling JF. Measurements of inter-cochlear level and phase differences of bone-conducted sound. J Acoust Soc Am 2017; 141:3421. [PMID: 28599562 PMCID: PMC5441991 DOI: 10.1121/1.4983471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 04/27/2017] [Accepted: 04/28/2017] [Indexed: 06/07/2023]
Abstract
Bone-anchored hearing aids are a widely used method of treating conductive hearing loss, but the benefit of bilateral implantation is limited due to interaural cross-talk. The present study measured the phase and level of pure tones reaching each cochlea from a single, mastoid placed bone transducer on normal hearing participants. In principle, the technique could be used to implement a cross-talk cancellation system in those with bilateral bone conductors. The phase and level of probe tones over two insert earphones was adjusted until they canceled sound from a bone transducer (i.e., resulting in perceived silence). Testing was performed in 50-Hz steps between 0.25 and 8 kHz. Probe phase and level results were used to calculate inter-cochlear level and phase differences. The inter-cochlear phase differences of the bone-conducted sound were similar for all three participants showing a relatively linear increase between 4 and 8 kHz. The attenuation characteristics were highly variable over the frequency range as well as between participants. This variability was thought to be related to differences in skull dynamics across the ears. Repeated measurements of cancellation phase and level of the same frequency produced good consistency across sessions from the same participant.
Collapse
Affiliation(s)
- Robert W J Mcleod
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| |
Collapse
|
13
|
Freeman TCA, Culling JF, Akeroyd MA, Brimijoin WO. Auditory compensation for head rotation is incomplete. J Exp Psychol Hum Percept Perform 2017; 43:371-380. [PMID: 27841453 PMCID: PMC5289217 DOI: 10.1037/xhp0000321] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 08/24/2016] [Accepted: 09/01/2016] [Indexed: 01/25/2023]
Abstract
Hearing is confronted by a similar problem to vision when the observer moves. The image motion that is created remains ambiguous until the observer knows the velocity of eye and/or head. One way the visual system solves this problem is to use motor commands, proprioception, and vestibular information. These "extraretinal signals" compensate for self-movement, converting image motion into head-centered coordinates, although not always perfectly. We investigated whether the auditory system also transforms coordinates by examining the degree of compensation for head rotation when judging a moving sound. Real-time recordings of head motion were used to change the "movement gain" relating head movement to source movement across a loudspeaker array. We then determined psychophysically the gain that corresponded to a perceptually stationary source. Experiment 1 showed that the gain was small and positive for a wide range of trained head speeds. Hence, listeners perceived a stationary source as moving slightly opposite to the head rotation, in much the same way that observers see stationary visual objects move against a smooth pursuit eye movement. Experiment 2 showed the degree of compensation remained the same for sounds presented at different azimuths, although the precision of performance declined when the sound was eccentric. We discuss two possible explanations for incomplete compensation, one based on differences in the accuracy of signals encoding image motion and self-movement and one concerning statistical optimization that sacrifices accuracy for precision. We then consider the degree to which such explanations can be applied to auditory motion perception in moving listeners. (PsycINFO Database Record
Collapse
Affiliation(s)
| | | | - Michael A Akeroyd
- Medical Research Council Institute of Hearing Research, University of Nottingham
| | - W Owen Brimijoin
- Medical Research Council/Chief Scientist Office Institute of Hearing Research-Scottish Section, Glasgow Royal Infirmary
| |
Collapse
|
14
|
|
15
|
Grange JA, Culling JF. Head orientation benefit to speech intelligibility in noise for cochlear implant users and in realistic listening conditions. J Acoust Soc Am 2016; 140:4061. [PMID: 28039996 DOI: 10.1121/1.4968515] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Cochlear implant (CI) users suffer from elevated speech-reception thresholds and may rely on lip reading. Traditional measures of spatial release from masking quantify speech-reception-threshold improvement with azimuthal separation of target speaker and interferers and with the listener facing the target speaker. Substantial benefits of orienting the head away from the target speaker were predicted by a model of spatial release from masking. Audio-only and audio-visual speech-reception thresholds in normal-hearing (NH) listeners and bilateral and unilateral CI users confirmed model predictions of this head-orientation benefit. The benefit ranged 2-5 dB for a modest 30° orientation that did not affect the lip-reading benefit. NH listeners' and CI users' lip-reading benefit measured 3 and 5 dB, respectively. A head-orientation benefit of ∼2 dB was also both predicted and observed in NH listeners in realistic simulations of a restaurant listening environment. Exploiting the benefit of head orientation is thus a robust hearing tactic that would benefit both NH listeners and CI users in noisy listening conditions.
Collapse
Affiliation(s)
- Jacques A Grange
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF103AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, 70 Park Place, Cardiff CF103AT, United Kingdom
| |
Collapse
|
16
|
Abstract
Speech reception thresholds (SRTs) for a target voice on the same virtual table were measured in various restaurant simulations under conditions of masking by between one and eight interferers at other tables. Results for different levels of reverberation and different simulation techniques were qualitatively similar. SRTs increased steeply with the number of interferers, reflecting progressive failure to perceptually unmask the target speech as the acoustic scene became more complex. For a single interferer, continuous noise was the most effective masker, and a single interfering voice of either gender was least effective. With two interferers, evidence of informational masking emerged as a difference in SRT between forward and reversed speech, but SRTs for all interferer types progressively converged at four and eight interferers. In simulation based on a real room, this occurred at a signal-to-noise ratio of around -5 dB.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| |
Collapse
|
17
|
Abstract
Spatial release from masking is traditionally measured with speech in front. The effect of head-orientation with respect to the speech direction has rarely been studied. Speech-reception thresholds (SRTs) were measured for eight head orientations and four spatial configurations. Benefits of head orientation away from the speech source of up to 8 dB were measured. These correlated with predictions of a model based on better-ear listening and binaural unmasking (r = 0.96). Use of spontaneous head orientations was measured when listeners attended to long speech clips of gradually diminishing speech-to-noise ratio in a sound-deadened room. Speech was presented from the loudspeaker that initially faced the listener and noise from one of four other locations. In an undirected paradigm, listeners spontaneously turned their heads away from the speech in 56% of trials. When instructed to rotate their heads in the diminishing speech-to-noise ratio, all listeners turned away from the speech and reached head orientations associated with lower SRTs. Head orientation may prove valuable for hearing-impaired listeners.
Collapse
Affiliation(s)
- Jacques A Grange
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF103AT, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF103AT, United Kingdom
| |
Collapse
|
18
|
Leclère T, Lavandier M, Culling JF. Speech intelligibility prediction in reverberation: Towards an integrated model of speech transmission, spatial unmasking, and binaural de-reverberation. J Acoust Soc Am 2015; 137:3335-3345. [PMID: 26093423 DOI: 10.1121/1.4921028] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Room acoustic indicators of intelligibility have focused on the effects of temporal smearing of speech by reverberation and masking by diffuse ambient noise. In the presence of a discrete noise source, these indicators neglect the binaural listener's ability to separate target speech from noise. Lavandier and Culling [(2010). J. Acoust. Soc. Am. 127, 387-399] proposed a model that incorporates this ability but neglects the temporal smearing of speech, so that predictions hold for near-field targets. An extended model based on useful-to-detrimental (U/D) ratios is presented here that accounts for temporal smearing, spatial unmasking, and binaural de-reverberation in reverberant environments. The influence of the model parameters was tested by comparing the model predictions with speech reception thresholds measured in three experiments from the literature. Accurate predictions were obtained by adjusting the parameters to each room. Room-independent parameters did not lead to similar performances, suggesting that a single U/D model cannot be generalized to any room. Despite this limitation, the model framework allows to propose a unified interpretation of spatial unmasking, temporal smearing, and binaural de-reverberation.
Collapse
Affiliation(s)
- Thibaud Leclère
- Université de Lyon, École Nationale des Travaux Publics de l'État, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, 69518 Vaulx-en-Velin Cedex, France
| | - Mathieu Lavandier
- Université de Lyon, École Nationale des Travaux Publics de l'État, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, 69518 Vaulx-en-Velin Cedex, France
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| |
Collapse
|
19
|
Deroche MLD, Culling JF, Chatterjee M. Phase effects in masking by harmonic complexes: detection of bands of speech-shaped noise. J Acoust Soc Am 2014; 136:2726-2736. [PMID: 25373972 PMCID: PMC4224678 DOI: 10.1121/1.4896457] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2013] [Revised: 08/04/2014] [Accepted: 09/05/2014] [Indexed: 06/04/2023]
Abstract
When phase relationships between partials of a complex masker produce highly modulated temporal envelopes on the basilar membrane, listeners may detect speech information from temporal dips in the within-channel masker envelopes. This source of masking release (MR) is however located in regions of unresolved masker partials and it is unclear how much of the speech information in these regions is really needed for intelligibility. Also, other sources of MR such as glimpsing in between resolved masker partials may provide sufficient information from regions that disregard phase relationships. This study simplified the problem of speech recognition to a masked detection task. Target bands of speech-shaped noise were restricted to frequency regions containing either only resolved or only unresolved masker partials, as a function of masker phase relationships (sine or random), masker fundamental frequency (F0) (50, 100, or 200 Hz), and masker spectral profile (flat-spectrum or speech-shaped). Although masker phase effects could be observed in unresolved regions at F0s of 50 and 100 Hz, it was only at 50-Hz F0 that detection thresholds were ever lower in unresolved than in resolved regions, suggesting little role of envelope modulations for harmonic complexes with F0s in the human voice range and at moderate level.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| | - Monita Chatterjee
- Auditory Prostheses and Perception Laboratory, Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131
| |
Collapse
|
20
|
Deroche MLD, Culling JF, Chatterjee M, Limb CJ. Roles of the target and masker fundamental frequencies in voice segregation. J Acoust Soc Am 2014; 136:1225. [PMID: 25190396 PMCID: PMC4165228 DOI: 10.1121/1.4890649] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 07/04/2014] [Accepted: 07/09/2014] [Indexed: 05/26/2023]
Abstract
Intelligibility of a target voice improves when its fundamental frequency (F0) differs from that of a masking voice, but it remains unclear how this masking release (MR) depends on the two relative F0s. Three experiments measured speech reception thresholds (SRTs) for a target voice against different maskers. Experiment 1 evaluated the influence of target F0 itself. SRTs against white noise were elevated by at least 2 dB for a monotonized target voice compared with the unprocessed voice, but SRTs differed little for F0s between 50 and 150 Hz. In experiments 2 and 3, a MR occurred when there was a steady difference in F0 between the target voice and a stationary speech-shaped harmonic complex or a babble. However, this MR was considerably larger when the F0 of the masker was 11 semitones above the target F0 than when it was 11 semitones below. In contrast, for a fixed masker F0, the MR was similar whether the target F0 was above or below. The dependency of these MRs on the masker F0 suggests that a spectral mechanism such as glimpsing in between resolved masker partials may account for an important part of this phenomenon.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| | - Monita Chatterjee
- Auditory Prostheses and Perception Laboratory, Boys Town National Research Hospital, 555 N 30th Street, Omaha, Nebraska 68131
| | - Charles J Limb
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205
| |
Collapse
|
21
|
Deroche MLD, Culling JF, Chatterjee M, Limb CJ. Speech recognition against harmonic and inharmonic complexes: spectral dips and periodicity. J Acoust Soc Am 2014; 135:2873-84. [PMID: 24815268 PMCID: PMC4032410 DOI: 10.1121/1.4870056] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Revised: 03/19/2014] [Accepted: 03/19/2014] [Indexed: 05/20/2023]
Abstract
Speech recognition in a complex masker usually benefits from masker harmonicity, but there are several factors at work. The present study focused on two of them, glimpsing spectrally in between masker partials and periodicity within individual frequency channels. Using both a theoretical and an experimental approach, it is demonstrated that when inharmonic complexes are generated by jittering partials from their harmonic positions, there are better opportunities for spectral glimpsing in inharmonic than in harmonic maskers, and this difference is enhanced as fundamental frequency (F0) increases. As a result, measurements of masking level difference between the two maskers can be reduced, particularly at higher F0s. Using inharmonic maskers that offer similar glimpsing opportunity to harmonic maskers, it was found that the masking level difference between the two maskers varied little with F0, was influenced by periodicity of the first four partials, and could occur in low-, mid-, or high-frequency regions. Overall, the present results suggested that both spectral glimpsing and periodicity contribute to speech recognition under masking by harmonic complexes, and these effects seem independent from one another.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| | - Monita Chatterjee
- Auditory Prostheses and Perception Laboratory, Boys Town National Research Hospital, 555 N 30th Street, Omaha, Nebraska 68131
| | - Charles J Limb
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205
| |
Collapse
|
22
|
Tweedy RS, Culling JF. Does the signal-to-noise ratio of an interlocutor influence a speaker's vocal intensity? COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2013.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Cosentino S, Marquardt T, McAlpine D, Culling JF, Falk TH. A model that predicts the binaural advantage to speech intelligibility from the mixed target and interferer signals. J Acoust Soc Am 2014; 135:796-807. [PMID: 25234888 DOI: 10.1121/1.4861239] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A model is presented that predicts the binaural advantage to speech intelligibility by analyzing the right and left recordings at the two ears containing mixed target and interferer signals. This auditory-inspired model implements an equalization-cancellation stage to predict the binaural unmasking (BU) component, in conjunction with a modulation-frequency estimation block to estimate the "better ear" effect (BE) component of the binaural advantage. The model's performance was compared to experimental data obtained under anechoic and reverberant conditions using a single speech-shaped noise interferer paradigm. The internal BU and BE components were compared to those of the speech intelligibility model recently proposed by Lavandier et al. [J. Acoust. Soc. Am. 131, 218-231 (2012)], which requires separate inputs for target and interferer. The data indicate that the proposed model provides comparably good predictions from a mixed-signals input under both anechoic and reverberant conditions.
Collapse
Affiliation(s)
| | | | - David McAlpine
- Ear Institute, University College London, London, United Kingdom
| | - John F Culling
- School of Psychology, Cardiff University, Tower Building, Cardiff, United Kingdom
| | - Tiago H Falk
- Institut National de la Recherche Scientifique-Eńergie Matériaux Télécommunications, University of Québec, Montreal, Canada
| |
Collapse
|
24
|
Abstract
Speech reception thresholds were measured for a voice against two different maskers: Either two concurrent voices with the same fundamental frequency (F0) or a harmonic complex with the same long-term excitation pattern and broadband temporal envelope as the masking sentences (speech-modulated buzz). All sources had steady F0s. A difference in F0 of 2 or 8 semitones provided a 5-dB benefit for buzz maskers, whereas it provided a 3- and 8-dB benefit, respectively, for masking sentences. Whether intelligibility of a voice increases abruptly with small ΔF0s or gradually toward larger ΔF0s seems to depend on the nature of the masker.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, Maryland 21205
| | | |
Collapse
|
25
|
Deroche MLD, Culling JF, Chatterjee M. Phase effects in masking by harmonic complexes: speech recognition. Hear Res 2013; 306:54-62. [PMID: 24076425 DOI: 10.1016/j.heares.2013.09.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Revised: 09/13/2013] [Accepted: 09/18/2013] [Indexed: 11/26/2022]
Abstract
Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL).
Collapse
Affiliation(s)
- Mickael L D Deroche
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, MD 21205, USA.
| | | | | |
Collapse
|
26
|
Affiliation(s)
- Siwan Roberts
- Child and Adolescent Mental Health Service (CAMHS) Bangor University
| | | | | | | | | | | |
Collapse
|
27
|
Abstract
At a cocktail party, listeners are faced with multiple, spatially distributed interfering voices. The dominant interfering voice may change from moment to moment and, consequently, change in spatial location. The ability of the binaural system to deal with such a dynamic scene has not been systematically analyzed. Spatial release from masking (SRM) was measured in simple spatial scenes, simulated over headphones with a frontal speech source. For a single noise at 105°, SRM was reduced if that noise modulated (10 Hz square wave, 50% duty cycle, 20 dB modulation depth), but, for two noises in symmetrical locations, SRM increased if the noises were modulated in alternation, suggesting that the binaural system can "switch" between exploiting different spatial configurations. Experiment 2 assessed the contributions of interaural time and level differences as a function of modulation rate (1-20 Hz). Scenes were created using the original head-related impulse responses and ones that had been manipulated to isolate each cue. SRM decreased steeply with modulation rate. The combined effects of interaural time and level differences were consistent with additive contributions. The results indicate that binaural sluggishness limits the contribution of binaural switching to speech understanding at a cocktail party.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom.
| | | |
Collapse
|
28
|
Lavandier M, Jelfs S, Culling JF, Watkins AJ, Raimond AP, Makin SJ. Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources. J Acoust Soc Am 2012; 131:218-31. [PMID: 22280586 DOI: 10.1121/1.3662075] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.
Collapse
Affiliation(s)
- Mathieu Lavandier
- Département Génie Civil et Bâtiment, Université de Lyon, Ecole Nationale des Travaux Publics de l’Etat, Rue M Audin, 69518 Vaulx-en-Velin Cedex, France.
| | | | | | | | | | | |
Collapse
|
29
|
Deroche ML, Culling JF. Narrow noise band detection in a complex masker: Masking level difference due to harmonicity. Hear Res 2011; 282:225-35. [DOI: 10.1016/j.heares.2011.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 06/14/2011] [Accepted: 07/15/2011] [Indexed: 10/17/2022]
|
30
|
Abstract
Two experiments investigated listeners' ability to use a difference of two semitones in fundamental frequency (F0) to segregate a target voice from harmonic complex tones, with speech-like spectral profiles. Masker partials were in random phase (experiment 1) or in sine phase (experiment 2) and stimuli were presented over headphones. Target's and masker's harmonicity were each distorted by F0 modulation and reverberation. The F0 of each source was manipulated (monotonized or modulated by 2 semitones at 5 Hz) factorially. In addition, all sources were presented from the same location in a virtual room with controlled reverberation, assigned factorially to each source. In both experiments, speech reception thresholds increased by about 2 dB when the F0 of the masker was modulated and increased by about 6 dB when, in addition to F0 modulation, the masker was reverberant. Masker partial phases did not influence the results. The results suggest that F0-segregation relies upon the masker's harmonicity, which is disrupted by rapid modulation. This effect is compounded by reverberation. In addition, F0-segregation was found to be independent of the depth of masker envelope modulations.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Cochlear Implants and Psychophysics Lab, Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA.
| | | |
Collapse
|
31
|
Abstract
The addition of a signal in the N0Sπ binaural configuration gives rise to fluctuations in interaural phase and amplitude. Sensitivity to these individual cues was measured by applying sinusoidal amplitude modulation (AM) or quasi-frequency modulation (QFM) to a band of noise. Discrimination between interaurally in-phase and out-of-phase modulation was measured using an adaptive task for narrow bands of noise at center frequencies from 250 to 1500 Hz, for modulation rates of 2-40 Hz, and with or without flanking bands of diotic noise. Discrimination thresholds increased steeply for QFM with increasing center frequency, but increased only modestly for AM, and mainly for modulation rates below 10 Hz. Flanking bands of noise increased thresholds for AM, but had no consistent effect for QFM. The results suggest that two underlying mechanisms may support binaural unmasking: one most sensitive to interaural amplitude modulations that is susceptible to across-frequency interference, and a second, most sensitive to interaural phase modulations that is immune to such effects.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom. CullingJ@cf. ac.uk
| |
Collapse
|
32
|
Jelfs S, Culling JF, Lavandier M. Revision and validation of a binaural model for speech intelligibility in noise. Hear Res 2011; 275:96-104. [DOI: 10.1016/j.heares.2010.12.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Revised: 12/01/2010] [Accepted: 12/06/2010] [Indexed: 10/18/2022]
|
33
|
Culling JF, Lewis HG. Trading of intensity and interaural coherence in dichotic pitch stimuli. J Acoust Soc Am 2010; 128:1908-1914. [PMID: 20968362 DOI: 10.1121/1.3478853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
When a signal is added to noise in the NoSπ binaural configuration, a reduction in interaural coherence, ρ, occurs at the signal frequency and increases in tone intensity decrease ρ. Corresponding manipulations of ρ result in the perception of a phantom signal which increases in loudness as ρ decreases [Culling et al. (2001). J. Acoust. Soc. Am. 110, 1020-1029]. In the present study, a narrow sub-band of noise (462-539 Hz) embedded within a broadband (0-3 kHz) diotic noise was manipulated in both intensity and ρ in a 3-interval, odd-one-out task. In the reference intervals, ρ was zero and the spectrum was flat. In the target interval, both ρ and the intensity of the target band were incremented giving opposing effects on loudness. Correct identification of the target interval followed a V-shape as a function of the size of intensity increment. The minimum of this function was often at chance performance, indicating that monaurally and binaurally evoked loudness were fully traded. These results show that reduction in ρ at a given frequency produces increased loudness at that frequency equivalent to up to 6 dB and consistent with an equalization-cancellation mechanism whose binaural output is strongly weighted compared to monaural excitation.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom.
| | | |
Collapse
|
34
|
Abstract
The spectral resolution of the binaural system was measured using a tone-detection task in a binaural analog of the notched-noise technique. Three listeners performed 2-interval, 2-alternative, forced choice tasks with a 500-ms out-of-phase signal within 500 ms of broadband masking noise consisting of an "outer" band of either interaurally uncorrelated or anticorrelated noise, and an "inner" band of interaurally correlated noise. Three signal frequencies were tested (250, 500, and 750 Hz), and the asymmetry of the filter was measured by keeping the signal at a constant frequency and moving the correlated noise band relative to the signal. Thresholds were taken for bandwidths of correlated noise ranging from 0 to 400 Hz. The equivalent rectangular bandwidth of the binaural filter was found to increase with signal frequency, and estimates tended to be larger than monaural bandwidths measured for the same listeners using equivalent techniques.
Collapse
Affiliation(s)
- Andrew J Kolarik
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| | | |
Collapse
|
35
|
Abstract
In the presence of competing speech or noise, reverberation degrades speech intelligibility not only by its direct effect on the target but also by affecting the interferer. Two experiments were designed to validate a method for predicting the loss of intelligibility associated with this latter effect. Speech reception thresholds were measured under headphones, using spatially separated target sentences and speech-shaped noise interferers simulated in virtual rooms. To investigate the effect of reverberation on the interferer unambiguously, the target was always anechoic. The interferer was placed in rooms with different sizes and absorptions, and at different distances and azimuths from the listener. The interaural coherence of the interferer did not fully predict the effect of reverberation. The azimuth separation of the sources and the coloration introduced by the room also had to be taken into account. The binaural effects were modeled by computing the binaural masking level differences in the studied configurations, the monaural effects were predicted from the excitation pattern of the noises, and speech intelligibility index weightings were applied to both. These parameters were all calculated from the room impulse responses convolved with noise. A 0.95-0.97 correlation was obtained between the speech reception thresholds and their predicted value.
Collapse
Affiliation(s)
- Mathieu Lavandier
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom.
| | | |
Collapse
|
36
|
Abstract
Speech-in-noise audiometry has potential application as a low-cost, self-screening test for sensorineural hearing loss. To realize this potential, the influence of variations in audio equipment and listening environment need assessment. The present study assessed: 1) the frequency response and distortion produced by a wide range of commercially available audio equipment; 2) the effects of such variations upon test results with normally hearing subjects using a simple, open-set, word-identification test; 3) the effect of distortion on the speech reception threshold using digitally applied distortion; and 4) the reliability of the test in listening environments with different levels of reverberation. In addition, preliminary tests were conducted with elderly listeners. The results indicate that variations in equipment have negligible effects on speech-in-noise audiometry. The only factor that substantially elevated normally hearing listeners' thresholds was high levels of room reverberation when using loudspeaker presentation. Variations in equipment and environment thus present no significant obstacle to the development of a self-administered audiometric screening test based on speech in noise.
Collapse
|
37
|
Abstract
The effect of interaural correlation (rho) on the loudness for noisebands was measured using a loudness-matching task in naive listeners. The task involved a sequence of loudness comparisons for which the intensity of one stimulus in a given comparison was varied using a one-up-one-down adaptive rule. The task provided an estimate of the level difference (in decibels) for which two stimulus conditions have equal loudness, giving measures of loudness difference in equivalent decibel units (dB(equiv)). Concurrent adaptive tracks measured loudness differences between rho=1, 0, and -1 and between these binaural stimuli and the monaural case for various noisebands. For all noisebands, monaural stimuli required approximately 6 dB higher levels than rho=1 for equal loudness. For most noisebands, rho=1 and rho=-1 were almost equal in loudness, with rho=-1 being slightly louder in the majority of measurements, while rho=0 was about 2 dB(equiv) louder than rho=1 or rho=-1. However, noisebands with significant high-frequency energy showed smaller differences: for 3745-4245 Hz, rho=0 was only about 0.85 dB(equiv) louder than rho=+/-1, and for 100-5000 Hz it was non-significantly louder (perhaps 0.7 dB(equiv)).
Collapse
Affiliation(s)
- Barrie A Edmonds
- School of Psychology, Cardiff University, Cardiff, United Kingdom
| | | |
Collapse
|
38
|
Abstract
Four experiments measured discrimination of interaural time delay (ITD) thresholds for broadband noise in the presence of masking noise of the same bandwidth as the target (0.1-3 kHz for experiments 1-3 and 0-10 kHz for experiment 4). In experiments 1-3, listeners performed interaural two-interval two-alternative forced-choice (2I-2AFC) delay discrimination tasks with stimuli composed of delayed and masking noises mixed in proportions of delayed noise ranging between 1 and 0.05. Experiments 1-3 employed interaurally correlated, anticorrelated, and uncorrelated maskers, respectively. Experiment 4 measured centering accuracy for continuous noise with a range of interaural coherences (equivalent to proportion of delayed noise) obtained by mixing delayed and interaurally uncorrelated noises. Results indicate that in the presence of an interaurally correlated masker ITD thresholds doubled for every halving of the proportion of delayed noise power in the stimulus. This function became steeper as the masking noise changed from interaurally correlated, to uncorrelated, to anticorrelated. The results were compared to thresholds predicted by a model based on variations in the distribution of interaural phase differences of the stimulus components.
Collapse
Affiliation(s)
- Andrew J Kolarik
- School of Psychology, Cardiff University, Cardiff, United Kingdom
| | | |
Collapse
|
39
|
Kolarik AJ, Culling JF. Measurement of the binaural temporal window using a lateralisation task. Hear Res 2008; 248:60-8. [PMID: 19111600 DOI: 10.1016/j.heares.2008.12.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2008] [Revised: 11/28/2008] [Accepted: 12/01/2008] [Indexed: 10/21/2022]
Abstract
Binaural temporal resolution was measured using the discrimination of brief interaural time delays (ITDs). In experiment 1, three listeners performed a 2I-2AFC, ITD-discrimination procedure. ITD changes of 8 to 1024micros were applied to brief probe noises. These probes, with durations of 16 to 362ms, were placed symmetrically in time within a 500-ms burst of otherwise interaurally uncorrelated noise. Psychometric functions were measured to obtain thresholds and temporal windows fitted to those thresholds. The best-fitting window was a symmetric roex shape (equivalent rectangular duration=197ms), an order of magnitude longer than monaural temporal windows and differed substantially from windows reported by Bernstein et al. [Bernstein, L.R., Trahiotis, C., Akeroyd, M.A., Hartung, K., 2001. Sensitivity to brief changes of interaural time and interaural intensity. J. Acoust. Soc. Am. 109, 1604-1615]. Experiment 2, replicated their main experiment, comparing their ITD-detection task with a similar discrimination procedure. Thresholds in the detection conditions were significantly better than those in the discrimination condition, particularly for short probe durations, indicating the use of an additional cue at these durations for the detection task and thus undermining the assumptions made in their window fit.
Collapse
|
40
|
Lavandier M, Culling JF. Speech segregation in rooms: monaural, binaural, and interacting effects of reverberation on target and interferer. J Acoust Soc Am 2008; 123:2237-48. [PMID: 18397029 DOI: 10.1121/1.2871943] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Speech reception thresholds were measured in virtual rooms to investigate the influence of reverberation on speech intelligibility for spatially separated targets and interferers. The measurements were realized under headphones, using target sentences and noise or two-voice interferers. The room simulation allowed variation of the absorption coefficient of the room surfaces independently for target and interferer. The direct-to-reverberant ratio and interaural coherence of sources were also varied independently by considering binaural and diotic listening. The main effect of reverberation on the interferer was binaural and mediated by the coherence, in agreement with binaural unmasking theories. It appeared at lower reverberation levels than the effect of reverberation on the target, which was mainly monaural and associated with the direct-to-reverberant ratio, and could be explained by the loss of amplitude modulation in the reverberant speech signals. This effect was slightly smaller when listening binaurally. Reverberation might also be responsible for a disruption of the mechanism by which the auditory system exploits fundamental frequency differences to segregate competing voices, and a disruption of the "listening in the gaps" associated with speech interferers. These disruptions may explain an interaction observed between the effects of reverberation on the targets and two-voice interferers.
Collapse
Affiliation(s)
- Mathieu Lavandier
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom.
| | | |
Collapse
|
41
|
Abstract
Three experiments investigated the roles of interaural correlation (rho) and of the monaural power spectrum in the detection and discrimination of narrow-band-noise signals (462-539 Hz) in broadband maskers (0-3 kHz). The power and rho of the target band were independently controlled, while the flanking noise was fixed and diotic. Experiments 1 and 2 involved rho and power values that would be produced by specific values of signal-to-noise ratio (SNR) in the NoSpi binaural configuration. Listeners were required to discriminate different SNRs via a 2I-FC loudness-discrimination task. At low reference SNRs, changes in rho fully accounted for listeners' performance, but as reference SNR increased, additional energy in the target band played an increasing role. Experiment 2 showed that at these higher SNRs the combination of information from the power spectrum and rho was superadditive and could not be explained by simple signal-detection models. The equalization-cancellation (EC) theory would explain these data using the output from interaural cancellation, Y, rather than rho. Experiment 3 attempted to foil binaural processing, by fixing either rho or Y across intervals. Consistent with EC theory, when Y was fixed, the contribution of the binaural system appeared negligible, while fixing rho did not have this effect.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place Cardiff, CF10 3AT, United Kingdom.
| |
Collapse
|
42
|
Abstract
Speech reception thresholds were measured to investigate the influence of a room on speech segregation between a spatially separated target and interferer. The listening tests were realized under headphones. A room simulation allowed selected positioning of the interferer and target, as well as varying the absorption coefficient of the room internal surfaces. The measurements involved target sentences and speech-shaped noise or 2-voice interferers. Four experiments revealed that speech segregation in rooms was not only dependent on the azimuth separation of sound sources, but also on their direct-to-reverberant energy ratio at the listening position. This parameter was varied for interferer and target independently. Speech intelligibility decreased as the direct-to-reverberant ratio of sources was degraded by sound reflections in the room. The influence of the direct-to-reverberant ratio of the interferer was in agreement with binaural unmasking theories, through its effect on interaural coherence. The effect on the target occurred at higher levels of reverberation and was explained by the intrinsic degradation of speech intelligibility in reverberation.
Collapse
Affiliation(s)
- Mathieu Lavandier
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 SAT, United Kingdom.
| | | |
Collapse
|
43
|
Abstract
Four experiments investigated the effect of the fundamental frequency (F0) contour on speech intelligibility against interfering sounds. Speech reception thresholds (SRTs) were measured for sentences with different manipulations of their F0 contours. These manipulations involved either reductions in F0 variation, or complete inversion of the F0 contour. Against speech-shaped noise, a flattened F0 contour had no significant impact on SRTs compared to a normal F0 contour; the mean SRT for the flattened contour was only 0.4 dB higher. The mean SRT for the inverted contour, however, was 1.3 dB higher than for the normal F0 contour. When the sentences were played against a single-talker interferer, the overall effect was greater, with a 2.0 dB difference between normal and flattened conditions, and 3.8 dB between normal and inverted. There was no effect of altering the F0 contour of the interferer, indicating that any abnormality of the F0 contour serves to reduce intelligibility of the target speech, but does not alter the masking produced by interfering speech. Low-pass filtering the F0 contour increased SRTs; elimination of frequencies between 2 and 4 Hz had the greatest effect. Filtering sentences with inverted contours did not have a significant effect on SRTs.
Collapse
Affiliation(s)
- Christine Binns
- Department of Psychology, University of Cardiff, Tower Building, Park Place, Cardiff, CF10 3AT, Wales.
| | | |
Collapse
|
44
|
Abstract
Speech reception thresholds (SRTs) were measured for target speech presented concurrently with interfering speech (spoken by a different speaker). In experiment 1, the target and interferer were divided spectrally into high- and low-frequency bands and presented over headphones in three conditions: monaural, dichotic (target and interferer to different ears), and swapped (the low-frequency target band and the high-frequency interferer band were presented to one ear, while the high-frequency target band and the low-frequency interferer band were presented to the other ear). SRTs were highest in the monaural condition and lowest in the dichotic condition; SRTs in the swapped condition were intermediate. In experiment 2, two new conditions were devised such that one target band was presented in isolation to one ear while the other band was presented at the other ear with the interferer. The pattern of SRTs observed in experiment 2 suggests that performance in the swapped condition reflects the intelligibility of the target frequency bands at just one ear; the auditory system appears unable to exploit advantageous target-to-interferer ratios at different ears when segregating target speech from a competing speech interferer.
Collapse
Affiliation(s)
- Barrie A Edmonds
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff CF10 3AT, United Kingdom
| | | |
Collapse
|
45
|
Abstract
Two experiments explored the concept of the binaural spectrogram [Culling and Colburn, J. Acoust. Soc. Am. 107, 517-527 (2000)] and its relationship to monaurally derived information. In each experiment, speech was added to noise at an adverse signal-to-noise ratio in the NoS pi binaural configuration. The resulting monaural and binaural cues were analyzed within an array of spectro-temporal bins and then these cues were resynthesized by modulating the intensity and/or interaural correlation of freshly generated noise. Experiment 1 measured the intelligibility of the resynthesized stimuli and compared them with the original NoSo and NoS pi stimuli at a fixed signal-to-noise ratio. While NoS pi stimuli were approximately equal to 50% intelligible, each cue in isolation produced similar (very low) intelligibility to the NoSo condition. The resynthesized combination produced approximately equal to 25% intelligibility. Modulation of interaural correlation below 1.2 kHz and of amplitude above 1.2 kHz was not as effective as their combination across all frequencies. Experiment 2 measured three-point psychometric functions in which the signal-to-noise ratio of the original NoS pi stimulus was increased in 3-dB steps from the level used in experiment 1. Modulation of interaural correlation alone proved to have a flat psychometric function. The functions for NoS pi and for combined monaural and binaural cues appeared similar in slope, but shifted horizontally. The results indicate that for sentence materials, neither fluctuations in interaural correlation nor in monaural intensity are sufficient to support speech recognition at signal-to-noise ratios where 50% intelligibility is achieved in the NoS pi configuration; listeners appear to synergistically combine monaural and binaural information in this task, to some extent within the same frequency region.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT United Kingdom
| | | | | |
Collapse
|
46
|
Edmonds BA, Culling JF. The spatial unmasking of speech: evidence for within-channel processing of interaural time delay. J Acoust Soc Am 2005; 117:3069-78. [PMID: 15957775 DOI: 10.1121/1.1880752] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Across-frequency processing by common interaural time delay (ITD) in spatial unmasking was investigated by measuring speech reception thresholds (SRTs) for high- and low-frequency bands of target speech presented against concurrent speech or a noise masker. Experiment 1 indicated that presenting one of these target bands with an ITD of +500 micros and the other with zero ITD (like the masker) provided some release from masking, but full binaural advantage was only measured when both target bands were given an ITD of + 500 micros. Experiment 2 showed that full binaural advantage could also be achieved when the high- and low-frequency bands were presented with ITDs of equal but opposite magnitude (+/- 500 micros). In experiment 3, the masker was also split into high- and low-frequency bands with ITDs of equal but opposite magnitude (+/-500 micros). The ITD of the low-frequency target band matched that of the high-frequency masking band and vice versa. SRTs indicated that, as long as the target and masker differed in ITD within each frequency band, full binaural advantage could be achieved. These results suggest that the mechanism underlying spatial unmasking exploits differences in ITD independently within each frequency channel.
Collapse
Affiliation(s)
- Barrie A Edmonds
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| | | |
Collapse
|
47
|
Culling JF, Hawley ML, Litovsky RY. The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. J Acoust Soc Am 2004; 116:1057-1065. [PMID: 15376672 DOI: 10.1121/1.1772396] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Three experiments investigated the roles of interaural time differences (ITDs) and level differences (ILDs) in spatial unmasking in multi-source environments. In experiment 1, speech reception thresholds (SRTs) were measured in virtual-acoustic simulations of an anechoic environment with three interfering sound sources of either speech or noise. The target source lay directly ahead, while three interfering sources were (1) all at the target's location (0 degrees,0 degrees,0 degrees), (2) at locations distributed across both hemifields (-30 degrees,60 degrees,90 degrees), (3) at locations in the same hemifield (30 degrees,60 degrees,90 degrees), or (4) co-located in one hemifield (90 degrees,90 degrees,90 degrees). Sounds were convolved with head-related impulse responses (HRIRs) that were manipulated to remove individual binaural cues. Three conditions used HRIRs with (1) both ILDs and ITDs, (2) only ILDs, and (3) only ITDs. The ITD-only condition produced the same pattern of results across spatial configurations as the combined cues, but with smaller differences between spatial configurations. The ILD-only condition yielded similar SRTs for the (-30 degrees,60 degrees,90 degrees) and (0 degrees,0 degrees,0 degrees) configurations, as expected for best-ear listening. In experiment 2, pure-tone BMLDs were measured at third-octave frequencies against the ITD-only, speech-shaped noise interferers of experiment 1. These BMLDs were 4-8 dB at low frequencies for all spatial configurations. In experiment 3, SRTs were measured for speech in diotic, speech-shaped noise. Noises were filtered to reduce the spectrum level at each frequency according to the BMLDs measured in experiment 2. SRTs were as low or lower than those of the corresponding ITD-only conditions from experiment 1. Thus, an explanation of speech understanding in complex listening environments based on the combination of best-ear listening and binaural unmasking (without involving sound-localization) cannot be excluded.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Cardiff, CF10 3YG, United Kingdom.
| | | | | |
Collapse
|
48
|
Hawley ML, Litovsky RY, Culling JF. The benefit of binaural hearing in a cocktail party: effect of location and type of interferer. J Acoust Soc Am 2004; 115:833-43. [PMID: 15000195 DOI: 10.1121/1.1639908] [Citation(s) in RCA: 290] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
The "cocktail party problem" was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the "better monaural ear" from those for binaural listening. For a single interferer, there was a binaural advantage of 2-4 dB for all interferer types. For two or three interferers, the advantage was 2-4 dB for noise and speech-modulated noise, and 6-7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.
Collapse
Affiliation(s)
- Monica L Hawley
- Hearing Research Center and Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | | | | |
Collapse
|
49
|
Abstract
Two experiments investigated the effect of reverberation on listeners' ability to perceptually segregate two competing voices. Culling et al. [Speech Commun. 14, 71-96 (1994)] found that for competing synthetic vowels, masked identification thresholds were increased by reverberation only when combined with modulation of fundamental frequency (F0). The present investigation extended this finding to running speech. Speech reception thresholds (SRTs) were measured for a male voice against a single interfering female voice within a virtual room with controlled reverberation. The two voices were either (1) co-located in virtual space at 0 degrees azimuth or (2) separately located at +/-60 degrees azimuth. In experiment 1, target and interfering voices were either normally intonated or resynthesized with a fixed F0. In anechoic conditions, SRTs were lower for normally intonated and for spatially separated sources, while, in reverberant conditions, the SRTs were all the same. In experiment 2, additional conditions employed inverted F0 contours. Inverted F0 contours yielded higher SRTs in all conditions, regardless of reverberation. The results suggest that reverberation can seriously impair listeners' ability to exploit differences in F0 and spatial location between competing voices. The levels of reverberation employed had no effect on speech intelligibility in quiet.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, P.O. Box 901, Cardiff CF10 3YG, United Kingdom.
| | | | | |
Collapse
|
50
|
Abstract
Sensitivity to differences in interaural correlation was measured for 1.3-ERB-wide bands of noise using a 2IFC task at six frequencies: 250, 500, 750, 1000, 1250, and 1500 Hz. The sensitivity index, d', was measured for discriminations between a number of fixed pairs of correlation values. Cumulative d' functions were derived for each frequency and condition. The d' for discriminating any two values of correlation may be recovered from the cumulative d' function by the difference between cumulative d''s for these values. Two conditions were employed: the noisebands were either presented in isolation (narrow-band condition) or in the context of broad, contiguous flanking bands of correlated noise (fringed condition). The cumulative d' functions showed greater sensitivity to differences in correlation close to 1 than close to 0 at low frequencies, but this difference was less pronounced in the fringed condition. Also, a more linear relationship was observed when cumulative d' was plotted as a function of the equivalent signal-to-noise ratio (SNR) in dB for each correlation value, rather than directly against correlation. The equivalent SNR was the SNR at which the interaural correlation in an NoS(pi) stimulus would equal the interaural correlation of the noise used in the experiment. The maximum cumulative d' declined above 750 Hz. This decline was steeper for the fringed than for the narrow-band condition. For the narrow-band condition, the total cumulative d' was variable across listeners. All cumulative d' functions were closely fitted using a simple two-parameter function. The complete data sets, averaged across listeners, from the fringed and narrow-band conditions were fitted using functions to describe the changes in these parameters over frequency, in order to produce an interpolated family of curves that describe sensitivity at frequencies between those tested. These curves predict the spectra recovered by the binaural system when complex sounds, such as speech, are masked by noise.
Collapse
Affiliation(s)
- J F Culling
- Department of Biomedical Engineering, Boston University, Massachusetts 02215, USA.
| | | | | |
Collapse
|