1
|
Lalonde K, Dwyer G, Bosen A, Pitts A. Impact of High- and Low-Pass Acoustic Filtering on Audiovisual Speech Redundancy and Benefit in Children. Ear Hear 2025; 46:735-746. [PMID: 39787412 PMCID: PMC11996616 DOI: 10.1097/aud.0000000000001622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
OBJECTIVES To investigate the influence of frequency-specific audibility on audiovisual benefit in children, this study examined the impact of high- and low-pass acoustic filtering on auditory-only and audiovisual word and sentence recognition in children with typical hearing. Previous studies show that visual speech provides greater access to consonant place of articulation than other consonant features and that low-pass filtering has a strong impact on perception on acoustic consonant place of articulation. This suggests visual speech may be particularly useful when acoustic speech is low-pass filtered because it provides complementary information about consonant place of articulation. Therefore, we hypothesized that audiovisual benefit would be greater for low-pass filtered words than high-pass filtered speech. We assessed whether this pattern of results would translate to sentence recognition. DESIGN Children with typical hearing completed auditory-only and audiovisual tests of consonant-vowel-consonant word and sentence recognition across conditions differing in acoustic frequency content: a low-pass filtered condition in which children could only access acoustic content below 2 kHz and a high-pass filtered condition in which children could only access acoustic content above 2 kHz. They also completed a visual-only test of consonant-vowel-consonant word recognition. We analyzed word, consonant, and keyword-in-sentence recognition and consonant feature (place, voice/manner of articulation) transmission accuracy across modalities and filter conditions using binomial general linear mixed models. To assess the degree to which visual speech is complementary versus redundant with acoustic speech, we calculated the proportion of auditory-only target and response consonant pairs that we can tell apart using only visual speech and compared these values between high-pass and low-pass filter conditions. RESULTS In auditory-only conditions, recognition accuracy was lower for low-pass filtered consonants and consonant features than high-pass filtered consonants and consonant features, especially consonant place of articulation. In visual-only conditions, recognition accuracy was greater for consonant place of articulation than consonant voice/manner of articulation. In addition, auditory consonants in the low-pass filtered condition were more likely to be substituted for visually distinct consonants, meaning that there was more opportunity to use visual cues to supplement missing auditory information in the low-pass filtered condition. Audiovisual benefit for isolated whole words was greater for low-pass filtered speech than high-pass filtered speech. No difference in audiovisual benefit between filter conditions was observed for phonemes, features, or words-in-sentences. Ceiling effects limit the interpretation of these nonsignificant interactions. CONCLUSIONS For isolated word recognition, visual speech is more complementary with the acoustic speech cues children can access when high-frequency acoustic content is eliminated by low-pass filtering than when low-frequency acoustic content is eliminated by high-pass filtering. This decreased auditory-visual phonetic redundancy is accompanied by larger audiovisual benefit. In contrast, audiovisual benefit for sentence recognition did not differ between low-pass and high-pass filtered speech. This might reflect ceiling effects in audiovisual conditions or a decrease in the contribution of auditory-visual phonetic redundancy to explaining audiovisual benefit for connected speech. These results from children with typical hearing suggest that some variance in audiovisual benefit among children who are hard of hearing may depend in part on frequency-specific audibility.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska, USA
| | | | | | | |
Collapse
|
2
|
Bosen AK, Wasiuk PA, Calandruccio L, Buss E. Frequency importance for sentence recognition in co-located noise, co-located speech, and spatially separated speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:3275-3284. [PMID: 39545745 DOI: 10.1121/10.0034412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 10/25/2024] [Indexed: 11/17/2024]
Abstract
Frequency importance functions quantify the contribution of spectral frequencies to perception. Frequency importance has been well-characterized for speech recognition in quiet and steady-state noise. However, it is currently unknown whether frequency importance estimates generalize to more complex conditions such as listening in a multi-talker masker or when targets and maskers are spatially separated. Here, frequency importance was estimated by quantifying associations between local target-to-masker ratios at the output of an auditory filterbank and keyword recognition accuracy for sentences. Unlike traditional methods used to measure frequency importance, this technique estimates frequency importance without modifying the acoustic properties of the target or masker. Frequency importance was compared across sentences in noise and a two-talker masker, as well as sentences in a two-talker masker that was either co-located with or spatially separated from the target. Results indicate that frequency importance depends on masker type and spatial configuration. Frequencies above 5 kHz had lower importance and frequencies between 600 and 1900 Hz had higher importance in the presence of a two-talker masker relative to a noise masker. Spatial separation increased the importance of frequencies between 600 Hz and 5 kHz. Thus, frequency importance functions vary across listening conditions.
Collapse
Affiliation(s)
- Adam K Bosen
- Boys Town National Research Hospital, Center for Hearing Research, Omaha, Nebraska 68131, USA
| | - Peter A Wasiuk
- Department of Communication Disorders, Southern Connecticut State University, New Haven, Connecticut 06515, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
3
|
Gößwein JA, Chalupper J, Kohl M, Kinkel M, Kollmeier B, Rennies J. Evaluation of a semi-supervised self-adjustment fine-tuning procedure for hearing aids for asymmetrical hearing loss. Int J Audiol 2024:1-12. [PMID: 39368963 DOI: 10.1080/14992027.2024.2406884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 07/29/2024] [Accepted: 09/11/2024] [Indexed: 10/07/2024]
Abstract
OBJECTIVE This study investigated a previously evaluated self-adjustment procedure with respect to its applicability for asymmetrical hearing loss (AHL). Self-adjusted settings were evaluated for speech recognition in noise and sound preference. DESIGN Participants were given the possibility to adjust the left and right hearing aid separately using a two-dimensional user interface. Two different adjustment sequences were tested. Realistic everyday sound scenes in a laboratory environment were presented. The difference between the ears regarding their speech recognition in noise was tested with two spatial conditions, unaided as well as with the prescriptive formula and the self-adjusted setting. STUDY SAMPLE Nineteen experienced hearing aid users (median age 76 years) with different degrees of AHL were invited to participate in this study. RESULTS Participants adjusted a higher gain slope across frequency in the worse ear than in the better one. The two adjustment sequences resulted in significantly different adjustment durations and gain settings. The difference between the ears regarding speech recognition in noise did not change with the self-adjustment. Overall, group-mean effect sizes were small compared to the parameter space. CONCLUSIONS The adjustment procedure can be used also by hearing aid users with AHL to find a possibly preferred gain setting.
Collapse
Affiliation(s)
- Jonathan Albert Gößwein
- Fraunhofer Institute for Digital Media Technology (IDMT), Oldenburg Branch for Hearing, Speech, and Audio Technology (HSA) and Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Josef Chalupper
- Advanced Bionics, European Research Center, Hannover, Germany
| | - Manuel Kohl
- Advanced Bionics, European Research Center, Hannover, Germany
| | - Martin Kinkel
- KIND GmbH & Co. KG, Research and Development, Großburgwedel, Germany
| | - Birger Kollmeier
- Fraunhofer Institute for Digital Media Technology (IDMT), Oldenburg Branch for Hearing, Speech, and Audio Technology (HSA) and Cluster of Excellence "Hearing4all", Oldenburg, Germany
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Jan Rennies
- Fraunhofer Institute for Digital Media Technology (IDMT), Oldenburg Branch for Hearing, Speech, and Audio Technology (HSA) and Cluster of Excellence "Hearing4all", Oldenburg, Germany
| |
Collapse
|
4
|
Ananthanarayana RM, Buss E, Monson BB. Band importance for speech-in-speech recognition in the presence of extended high-frequency cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:1202-1213. [PMID: 39158325 PMCID: PMC11335358 DOI: 10.1121/10.0028269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/15/2024] [Accepted: 07/30/2024] [Indexed: 08/20/2024]
Abstract
Band importance functions for speech-in-noise recognition, typically determined in the presence of steady background noise, indicate a negligible role for extended high frequencies (EHFs; 8-20 kHz). However, recent findings indicate that EHF cues support speech recognition in multi-talker environments, particularly when the masker has reduced EHF levels relative to the target. This scenario can occur in natural auditory scenes when the target talker is facing the listener, but the maskers are not. In this study, we measured the importance of five bands from 40 to 20 000 Hz for speech-in-speech recognition by notch-filtering the bands individually. Stimuli consisted of a female target talker recorded from 0° and a spatially co-located two-talker female masker recorded either from 0° or 56.25°, simulating a masker either facing the listener or facing away, respectively. Results indicated peak band importance in the 0.4-1.3 kHz band and a negligible effect of removing the EHF band in the facing-masker condition. However, in the non-facing condition, the peak was broader and EHF importance was higher and comparable to that of the 3.3-8.3 kHz band in the facing-masker condition. These findings suggest that EHFs contain important cues for speech recognition in listening conditions with mismatched talker head orientations.
Collapse
Affiliation(s)
- Rohit M Ananthanarayana
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Emily Buss
- Department of Otolaryngology/HNS, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Brian B Monson
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
- Department of Biomedical and Translational Sciences, Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| |
Collapse
|
5
|
Jorgensen E, Wu YH. Effects of entropy in real-world noise on speech perception in listeners with normal hearing and hearing lossa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3627-3643. [PMID: 38051522 PMCID: PMC10699887 DOI: 10.1121/10.0022577] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/07/2023] [Accepted: 11/10/2023] [Indexed: 12/07/2023]
Abstract
Hearing aids show more benefit in traditional laboratory speech-in-noise tests than in real-world noisy environments. Real-world noise comprises a large range of acoustic properties that vary randomly and rapidly between and within environments, making quantifying real-world noise and using it in experiments and clinical tests challenging. One approach is to use acoustic features and statistics to quantify acoustic properties of real-world noise and control for them or measure their relationship to listening performance. In this study, the complexity of real-world noise from different environments was quantified using entropy in both the time- and frequency-domains. A distribution of noise segments from low to high entropy were extracted. Using a trial-by-trial design, listeners with normal hearing and hearing loss (in aided and unaided conditions) repeated back sentences embedded in these noise segments. Entropy significantly affected speech perception, with a larger effect of entropy in the time-domain than the frequency-domain, a larger effect for listeners with normal hearing than for listeners with hearing loss, and a larger effect for listeners with hearing loss in the aided than unaided condition. Speech perception also differed between most environment types. Combining entropy with the environment type improved predictions of speech perception above the environment type alone.
Collapse
Affiliation(s)
- Erik Jorgensen
- Department of Communication Sciences and Disorders University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Yu-Hsiang Wu
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, Iowa 52242, USA
| |
Collapse
|
6
|
Wasiuk PA, Calandruccio L, Oleson JJ, Buss E. Predicting speech-in-speech recognition: Short-term audibility and spatial separation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1827-1837. [PMID: 37728286 DOI: 10.1121/10.0021069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/28/2023] [Indexed: 09/21/2023]
Abstract
Quantifying the factors that predict variability in speech-in-speech recognition represents a fundamental challenge in auditory science. Stimulus factors associated with energetic and informational masking (IM) modulate variability in speech-in-speech recognition, but energetic effects can be difficult to estimate in spectro-temporally dynamic speech maskers. The current experiment characterized the effects of short-term audibility and differences in target and masker location (or perceived location) on the horizontal plane for sentence recognition in two-talker speech. Thirty young adults with normal hearing (NH) participated. Speech reception thresholds and keyword recognition at a fixed signal-to-noise ratio (SNR) were measured in each spatial condition. Short-term audibility for each keyword was quantified using a glimpsing model. Results revealed that speech-in-speech recognition depended on the proportion of audible glimpses available in the target + masker keyword stimulus in each spatial condition, even across stimuli presented at a fixed global SNR. Short-term audibility requirements were greater for colocated than spatially separated speech-in-speech recognition, and keyword recognition improved more rapidly as a function of increases in target audibility with spatial separation. Results indicate that spatial cues enhance glimpsing efficiency in competing speech for young adults with NH and provide a quantitative framework for estimating IM for speech-in-speech recognition in different spatial configurations.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Communication Disorders, 493 Fitch Street, Southern Connecticut State University, New Haven, Connecticut 06515, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Jacob J Oleson
- Department of Biostatistics, 145 North Riverside Drive N300, College of Public Health, University of Iowa, Iowa City, Iowa 52242, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
7
|
Shen Y, Langley L. Spectral weighting for sentence recognition in steady-state and amplitude-modulated noise. JASA EXPRESS LETTERS 2023; 3:2887651. [PMID: 37125871 PMCID: PMC10155216 DOI: 10.1121/10.0017934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 04/06/2023] [Indexed: 05/03/2023]
Abstract
Spectral weights in octave-frequency bands from 0.25 to 4 kHz were estimated for speech-in-noise recognition using two sentence materials (i.e., the IEEE and AzBio sentences). The masking noise was either unmodulated or sinusoidally amplitude-modulated at 8 Hz. The estimated spectral weights did not vary significantly across two test sessions and were similar for the two sentence materials. Amplitude-modulating the masker increased the weight at 2 kHz and decreased the weight at 0.25 kHz, which may support an upward shift in spectral weights for temporally fluctuating maskers.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, ,
| | - Lauren Langley
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, ,
| |
Collapse
|
8
|
Wasiuk PA, Buss E, Oleson JJ, Calandruccio L. Predicting speech-in-speech recognition: Short-term audibility, talker sex, and listener factors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3010. [PMID: 36456289 DOI: 10.1121/10.0015228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/01/2022] [Indexed: 06/17/2023]
Abstract
Speech-in-speech recognition can be challenging, and listeners vary considerably in their ability to accomplish this complex auditory-cognitive task. Variability in performance can be related to intrinsic listener factors as well as stimulus factors associated with energetic and informational masking. The current experiments characterized the effects of short-term audibility of the target, differences in target and masker talker sex, and intrinsic listener variables on sentence recognition in two-talker speech and speech-shaped noise. Participants were young adults with normal hearing. Each condition included the adaptive measurement of speech reception thresholds, followed by testing at a fixed signal-to-noise ratio (SNR). Short-term audibility for each keyword was quantified using a computational glimpsing model for target+masker mixtures. Scores on a psychophysical task of auditory stream segregation predicted speech recognition, with stronger effects for speech-in-speech than speech-in-noise. Both speech-in-speech and speech-in-noise recognition depended on the proportion of audible glimpses available in the target+masker mixture, even across stimuli presented at the same global SNR. Short-term audibility requirements varied systematically across stimuli, providing an estimate of the greater informational masking for speech-in-speech than speech-in-noise recognition and quantifying informational masking for matched and mismatched talker sex.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Jacob J Oleson
- Department of Biostatistics, 145 North Riverside Drive, University of Iowa, Iowa City, Iowa 52242, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
9
|
Monson BB, Buss E. On the use of the TIMIT, QuickSIN, NU-6, and other widely used bandlimited speech materials for speech perception experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1639. [PMID: 36182310 PMCID: PMC9473723 DOI: 10.1121/10.0013993] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/20/2022] [Accepted: 08/20/2022] [Indexed: 05/29/2023]
Abstract
The use of spectrally degraded speech signals deprives listeners of acoustic information that is useful for speech perception. Several popular speech corpora, recorded decades ago, have spectral degradations, including limited extended high-frequency (EHF) (>8 kHz) content. Although frequency content above 8 kHz is often assumed to play little or no role in speech perception, recent research suggests that EHF content in speech can have a significant beneficial impact on speech perception under a wide range of natural listening conditions. This paper provides an analysis of the spectral content of popular speech corpora used for speech perception research to highlight the potential shortcomings of using bandlimited speech materials. Two corpora analyzed here, the TIMIT and NU-6, have substantial low-frequency spectral degradation (<500 Hz) in addition to EHF degradation. We provide an overview of the phenomena potentially missed by using bandlimited speech signals, and the factors to consider when selecting stimuli that are sensitive to these effects.
Collapse
Affiliation(s)
- Brian B Monson
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Emily Buss
- Department of Otolaryngology/HNS, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514, USA
| |
Collapse
|
10
|
Buss E, Miller MK, Leibold LJ. Maturation of Speech-in-Speech Recognition for Whispered and Voiced Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3117-3128. [PMID: 35868232 PMCID: PMC9911131 DOI: 10.1044/2022_jslhr-21-00620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 04/01/2022] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech. METHOD Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker. RESULTS Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker. CONCLUSIONS These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.
Collapse
Affiliation(s)
- Emily Buss
- Department of Otolaryngology-Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Margaret K. Miller
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Lori J. Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|