1
|
Kursun B, Shola C, Cunio IE, Langley L, Shen Y. Variability of Preference-Based Adjustments on Hearing Aid Frequency-Gain Response. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2025; 68:2006-2025. [PMID: 40036873 DOI: 10.1044/2024_jslhr-24-00215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
PURPOSE Although users can customize the frequency-gain response of hearing aids, the variability in their individual adjustments remains a concern. This study investigated the within-subject variability in the gain adjustments made within a single self-adjustment procedure. METHOD Two experiments were conducted with 20 older adults with mild-to-severe hearing loss. Participants used a two-dimensional touchscreen to adjust hearing aid amplification across six frequency bands (0.25-8 kHz) while listening to continuous speech in background noise. In these two experiments, two user interface designs, differing in control-to-gain map, were tested. For each participant, the statistical properties of 30 repeated gain adjustments within a single self-adjustment procedure were analyzed. RESULTS When participants made multiple gain adjustments, their preferred gain settings showed the highest variability in the 4- and 8-kHz frequency bands and the lowest variability in the 1- and 2-kHz bands, suggesting that midfrequency bands are weighted more heavily in their preferences compared to high frequencies. Additionally, significant correlations were observed for the preferred gains between the 0.25- and 0.5-kHz bands, between the 0.5- and 1-kHz bands, and between the 4- and 8-kHz bands. Lastly, the standard error of the preferred gain reduced with an increasing number of trials, with a rate close to being slightly shallower than would be expected for invariant mean preference for most participants, suggesting convergent estimation of the underlying preference across trials. CONCLUSION Self-adjustments of frequency-gain profiles are informative about the underlying preference; however, the contributions from various frequency bands are neither equal nor independent. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.28405397.
Collapse
Affiliation(s)
- Bertan Kursun
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Chemay Shola
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Isabella E Cunio
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Lauren Langley
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
2
|
Bosen AK, Wasiuk PA, Calandruccio L, Buss E. Frequency importance for sentence recognition in co-located noise, co-located speech, and spatially separated speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:3275-3284. [PMID: 39545745 DOI: 10.1121/10.0034412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 10/25/2024] [Indexed: 11/17/2024]
Abstract
Frequency importance functions quantify the contribution of spectral frequencies to perception. Frequency importance has been well-characterized for speech recognition in quiet and steady-state noise. However, it is currently unknown whether frequency importance estimates generalize to more complex conditions such as listening in a multi-talker masker or when targets and maskers are spatially separated. Here, frequency importance was estimated by quantifying associations between local target-to-masker ratios at the output of an auditory filterbank and keyword recognition accuracy for sentences. Unlike traditional methods used to measure frequency importance, this technique estimates frequency importance without modifying the acoustic properties of the target or masker. Frequency importance was compared across sentences in noise and a two-talker masker, as well as sentences in a two-talker masker that was either co-located with or spatially separated from the target. Results indicate that frequency importance depends on masker type and spatial configuration. Frequencies above 5 kHz had lower importance and frequencies between 600 and 1900 Hz had higher importance in the presence of a two-talker masker relative to a noise masker. Spatial separation increased the importance of frequencies between 600 Hz and 5 kHz. Thus, frequency importance functions vary across listening conditions.
Collapse
Affiliation(s)
- Adam K Bosen
- Boys Town National Research Hospital, Center for Hearing Research, Omaha, Nebraska 68131, USA
| | - Peter A Wasiuk
- Department of Communication Disorders, Southern Connecticut State University, New Haven, Connecticut 06515, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
3
|
Ananthanarayana RM, Buss E, Monson BB. Band importance for speech-in-speech recognition in the presence of extended high-frequency cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:1202-1213. [PMID: 39158325 PMCID: PMC11335358 DOI: 10.1121/10.0028269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/15/2024] [Accepted: 07/30/2024] [Indexed: 08/20/2024]
Abstract
Band importance functions for speech-in-noise recognition, typically determined in the presence of steady background noise, indicate a negligible role for extended high frequencies (EHFs; 8-20 kHz). However, recent findings indicate that EHF cues support speech recognition in multi-talker environments, particularly when the masker has reduced EHF levels relative to the target. This scenario can occur in natural auditory scenes when the target talker is facing the listener, but the maskers are not. In this study, we measured the importance of five bands from 40 to 20 000 Hz for speech-in-speech recognition by notch-filtering the bands individually. Stimuli consisted of a female target talker recorded from 0° and a spatially co-located two-talker female masker recorded either from 0° or 56.25°, simulating a masker either facing the listener or facing away, respectively. Results indicated peak band importance in the 0.4-1.3 kHz band and a negligible effect of removing the EHF band in the facing-masker condition. However, in the non-facing condition, the peak was broader and EHF importance was higher and comparable to that of the 3.3-8.3 kHz band in the facing-masker condition. These findings suggest that EHFs contain important cues for speech recognition in listening conditions with mismatched talker head orientations.
Collapse
Affiliation(s)
- Rohit M Ananthanarayana
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Emily Buss
- Department of Otolaryngology/HNS, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Brian B Monson
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
- Department of Biomedical and Translational Sciences, Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| |
Collapse
|
4
|
Fan J, Williamson DS. From the perspective of perceptual speech quality: The robustness of frequency bands to noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1916-1927. [PMID: 38456734 DOI: 10.1121/10.0025272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 02/22/2024] [Indexed: 03/09/2024]
Abstract
Speech quality is one of the main foci of speech-related research, where it is frequently studied with speech intelligibility, another essential measurement. Band-level perceptual speech intelligibility, however, has been studied frequently, whereas speech quality has not been thoroughly analyzed. In this paper, a Multiple Stimuli With Hidden Reference and Anchor (MUSHRA) inspired approach was proposed to study the individual robustness of frequency bands to noise with perceptual speech quality as the measure. Speech signals were filtered into thirty-two frequency bands with compromising real-world noise employed at different signal-to-noise ratios. Robustness to noise indices of individual frequency bands was calculated based on the human-rated perceptual quality scores assigned to the reconstructed noisy speech signals. Trends in the results suggest the mid-frequency region appeared less robust to noise in terms of perceptual speech quality. These findings suggest future research aiming at improving speech quality should pay more attention to the mid-frequency region of the speech signals accordingly.
Collapse
Affiliation(s)
- Junyi Fan
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - Donald S Williamson
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
5
|
Buss E, Miller MK, Leibold LJ. Maturation of Speech-in-Speech Recognition for Whispered and Voiced Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3117-3128. [PMID: 35868232 PMCID: PMC9911131 DOI: 10.1044/2022_jslhr-21-00620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 04/01/2022] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech. METHOD Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker. RESULTS Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker. CONCLUSIONS These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.
Collapse
Affiliation(s)
- Emily Buss
- Department of Otolaryngology-Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Margaret K. Miller
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Lori J. Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
6
|
Shen Y, Yun D, Liu Y. Individualized estimation of the Speech Intelligibility Index for short sentences: Test-retest reliability. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1647. [PMID: 33003860 PMCID: PMC7511242 DOI: 10.1121/10.0001994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The speech intelligibility index (SII) model was modified to allow individualized parameters. These parameters included the relative weights of speech cues in five octave-frequency bands ranging from 0.25 to 4 kHz, i.e., the band importance function, and the transfer function that allows the SII to generate predictions on speech-recognition scores. A Bayesian adaptive procedure, the quick-band-importance-function (qBIF) procedure, was utilized to enable efficient estimation of the SII parameters from individual listeners. In two experiments, the SII parameters were estimated for 30 normal-hearing adults using Institute of Electrical and Electronics Engineers (IEEE) sentences at speech levels of 55, 65, and 75 dB sound pressure level (in Experiment I) and for 15 hearing-impaired (HI) adult listeners using amplified IEEE or AzBio sentences (in Experiment II). In both experiments, even without prior training, the estimated model parameters showed satisfactory reliability between two runs of the qBIF procedure at least one week apart. For the HI listeners, inter-listener variability in most estimated SII parameters was larger than intra-listener variability of the qBIF procedure.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, USA
| | - Donghyeon Yun
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| | - Yi Liu
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| |
Collapse
|
7
|
Bernstein JGW, Venezia JH, Grant KW. Auditory and auditory-visual frequency-band importance functions for consonant recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3712. [PMID: 32486805 DOI: 10.1121/10.0001301] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 05/05/2020] [Indexed: 06/11/2023]
Abstract
The relative importance of individual frequency regions for speech intelligibility has been firmly established for broadband auditory-only (AO) conditions. Yet, speech communication often takes place face-to-face. This study tested the hypothesis that under auditory-visual (AV) conditions, where visual information is redundant with high-frequency auditory cues, lower frequency regions will increase in relative importance compared to AO conditions. Frequency band-importance functions for consonants were measured for eight hearing-impaired and four normal-hearing listeners. Speech was filtered into four 1/3-octave bands each separated by an octave to minimize energetic masking. On each trial, the signal-to-noise ratio (SNR) in each band was selected randomly from a 10-dB range. AO and AV band-importance functions were estimated using three logistic-regression analyses: a primary model relating performance to the four independent SNRs; a control model that also included band-interaction terms; and a different set of four control models, each examining one band at a time. For both listener groups, the relative importance of the low-frequency bands increased under AV conditions, consistent with earlier studies using isolated speech bands. All three analyses showed similar results, indicating the absence of cross-band interactions. These results suggest that accurate prediction of AV speech intelligibility may require different frequency-importance functions than for AO conditions.
Collapse
Affiliation(s)
- Joshua G W Bernstein
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Jonathan H Venezia
- Veterans Affairs Loma Linda Healthcare System, 11201 Benton Street, Loma Linda, California 92357, USA
| | - Ken W Grant
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| |
Collapse
|
8
|
Du Y, Shen Y, Wu X, Chen J. The effect of speech material on the band importance function for Mandarin Chinese. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:445. [PMID: 31370645 PMCID: PMC7273514 DOI: 10.1121/1.5116691] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 05/23/2019] [Accepted: 06/25/2019] [Indexed: 05/17/2023]
Abstract
Speech material influences the relative contributions of different frequency regions to intelligibility for English. In the current study, whether a similar effect of speech material is present for Mandarin Chinese was investigated. Speech recognition was measured using three speech materials in Mandarin, including disyllabic words, nonsense sentences, and meaningful sentences. These materials differed from one another in terms of the amount of contextual information and word frequency. The band importance function (BIF), as defined under the Speech Intelligibility Index (SII) framework, was used to quantify the contributions across frequency regions. The BIFs for the three speech materials were estimated from 16 adults who were native speakers of Mandarin. A Bayesian adaptive procedure was used to efficiently estimate the octave-frequency BIFs for the three materials for each listener. As the amount of contextual information increased, low-frequency bands (e.g., 250 and 500 Hz) became more important for speech recognition, consistent with English. The BIF was flatter for Mandarin than for comparable English speech materials. Introducing the language- and material-specific BIFs to the SII model led to improved predictions of Mandarin speech-recognition performance. Results suggested the necessity of developing material-specific BIFs for Mandarin.
Collapse
Affiliation(s)
- Yufan Du
- Department of Machine Intelligence, Peking University, Beijing, China
| | - Yi Shen
- Department of Speech and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| | - Xihong Wu
- Department of Machine Intelligence, Peking University, Beijing, China
| | - Jing Chen
- Department of Machine Intelligence, Peking University, Beijing, China
| |
Collapse
|
9
|
Shen Y, Kern AB. An Analysis of Individual Differences in Recognizing Monosyllabic Words Under the Speech Intelligibility Index Framework. Trends Hear 2019. [PMID: 29532711 PMCID: PMC5858685 DOI: 10.1177/2331216518761773] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Individual differences in the recognition of monosyllabic words, either in isolation (NU6 test) or in sentence context (SPIN test), were investigated under the theoretical framework of the speech intelligibility index (SII). An adaptive psychophysical procedure, namely the quick-band-importance-function procedure, was developed to enable the fitting of the SII model to individual listeners. Using this procedure, the band importance function (i.e., the relative weights of speech information across the spectrum) and the link function relating the SII to recognition scores can be simultaneously estimated while requiring only 200 to 300 trials of testing. Octave-frequency band importance functions and link functions were estimated separately for NU6 and SPIN materials from 30 normal-hearing listeners who were naïve to speech recognition experiments. For each type of speech material, considerable individual differences in the spectral weights were observed in some but not all frequency regions. At frequencies where the greatest intersubject variability was found, the spectral weights were correlated between the two speech materials, suggesting that the variability in spectral weights reflected listener-originated factors.
Collapse
Affiliation(s)
- Yi Shen
- 1 Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, IN, USA
| | - Allison B Kern
- 1 Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, IN, USA
| |
Collapse
|
10
|
Yoho SE, Bosen AK. Individualized frequency importance functions for listeners with sensorineural hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:822. [PMID: 30823788 PMCID: PMC6375730 DOI: 10.1121/1.5090495] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The Speech Intelligibility Index includes a series of frequency importance functions for calculating the estimated intelligibility of speech under various conditions. Until recently, techniques to derive frequency importance required averaging data over a group of listeners, thus hindering the ability to observe individual differences due to factors such as hearing loss. In the current study, the "random combination strategy" [Bosen and Chatterjee (2016). J. Acoust. Soc. Am. 140, 3718-3727] was used to derive frequency importance functions for individual hearing-impaired listeners, and normal-hearing participants for comparison. Functions were measured by filtering sentences to contain only random subsets of frequency bands on each trial, and regressing speech recognition against the presence or absence of bands across trials. Results show that the contribution of each band to speech recognition was inversely proportional to audiometric threshold in that frequency region, likely due to reduced audibility, even though stimuli were shaped to compensate for each individual's hearing loss. The results presented in this paper demonstrate that this method is sensitive to factors that alter the shape of frequency importance functions within individuals with hearing loss, which could be used to characterize the impact of audibility or other factors related to suprathreshold deficits or hearing aid processing strategies.
Collapse
Affiliation(s)
- Sarah E Yoho
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan, Utah 84322, USA
| | - Adam K Bosen
- Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| |
Collapse
|
11
|
Yoho SE, Apoux F, Healy EW. The noise susceptibility of various speech bands. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2527. [PMID: 29716288 PMCID: PMC5927964 DOI: 10.1121/1.5034172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 02/21/2018] [Accepted: 04/06/2018] [Indexed: 05/28/2023]
Abstract
The degrading influence of noise on various critical bands of speech was assessed. A modified version of the compound method [Apoux and Healy (2012) J. Acoust. Soc. Am. 132, 1078-1087] was employed to establish this noise susceptibility for each speech band. Noise was added to the target speech band at various signal-to-noise ratios to determine the amount of noise required to reduce the contribution of that band by 50%. It was found that noise susceptibility is not equal across the speech spectrum, as is commonly assumed and incorporated into modern indexes. Instead, the signal-to-noise ratio required to equivalently impact various speech bands differed by as much as 13 dB. This noise susceptibility formed an irregular pattern across frequency, despite the use of multi-talker speech materials designed to reduce the potential influence of a particular talker's voice. But basic trends in the pattern of noise susceptibility across the spectrum emerged. Further, no systematic relationship was observed between noise susceptibility and speech band importance. It is argued here that susceptibility to noise and band importance are different phenomena, and that this distinction may be underappreciated in previous works.
Collapse
Affiliation(s)
- Sarah E Yoho
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Frédéric Apoux
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
12
|
Yoho SE, Healy EW, Youngdahl CL, Barrett TS, Apoux F. Speech-material and talker effects in speech band importance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1417. [PMID: 29604719 PMCID: PMC5851785 DOI: 10.1121/1.5026787] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Revised: 02/12/2018] [Accepted: 02/21/2018] [Indexed: 05/26/2023]
Abstract
Band-importance functions created using the compound method [Apoux and Healy (2012). J. Acoust. Soc. Am. 132, 1078-1087] provide more detail than those generated using the ANSI technique, necessitating and allowing a re-examination of the influences of speech material and talker on the shape of the band-importance function. More specifically, the detailed functions may reflect, to a larger extent, acoustic idiosyncrasies of the individual talker's voice. Twenty-one band functions were created using standard speech materials and recordings by different talkers. The band-importance functions representing the same speech-material type produced by different talkers were found to be more similar to one another than functions representing the same talker producing different speech-material types. Thus, the primary finding was the relative strength of a speech-material effect and weakness of a talker effect. This speech-material effect extended to other materials in the same broad class (different sentence corpora) despite considerable differences in the specific materials. Characteristics of individual talkers' voices were not readily apparent in the functions, and the talker effect was restricted to more global aspects of talker (i.e., gender). Finally, the use of multiple talkers diminished any residual effect of the talker.
Collapse
Affiliation(s)
- Sarah E Yoho
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Carla L Youngdahl
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Tyson S Barrett
- Department of Psychology, Utah State University, Logan, Utah 84322, USA
| | - Frédéric Apoux
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
13
|
Lee S, Mendel LL. Derivation of frequency importance functions for the AzBio sentences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3416. [PMID: 29289102 DOI: 10.1121/1.5014056] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Although the AzBio test is well validated, has effective standardization data available, and is highly recommended for Cochlear Implant (CI) evaluation, no attempt has been made to derive a Frequency Importance Function (FIF) for its stimuli. This study derived FIFs for the AzBio sentence lists using listeners with normal hearing. Traditional procedures described in studies by Studebaker and Sherbecoe [(1991). J. Speech. Lang. Hear. Res. 34, 427-438] were applied for this purpose. Participants with normal hearing listened to a large number of AzBio sentences that were high- and low-pass filtered under speech-spectrum shaped noise at various signal-to-noise ratios. Frequency weights for the AzBio sentences were greatest in the 1.5 to 2 kHz frequency regions as is the case with other speech materials. A cross-procedure comparison was conducted between the traditional procedure [Studebaker and Sherbecoe (1991). J. Speech. Lang. Hear. Res. 34, 427-438] and the nonlinear optimization procedure [Kates (2013). J. Acoust. Soc. Am. 134, EL459-EL464]. Consecutive data analyses provided speech recognition scores for the AzBio sentences in relation to the Speech Intelligibility Index (SII). The findings of the authors provide empirically derived FIFs for the AzBio test that can be used for future studies. It is anticipated that the accuracy of predicting SIIs for CI patients will be improved when using these derived FIFs for the AzBio test.
Collapse
Affiliation(s)
- Sungmin Lee
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| | - Lisa Lucks Mendel
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| |
Collapse
|
14
|
Bosen AK, Chatterjee M. Band importance functions of listeners with cochlear implants using clinical maps. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3718. [PMID: 27908046 PMCID: PMC5392084 DOI: 10.1121/1.4967298] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Band importance functions estimate the relative contribution of individual acoustic frequency bands to speech intelligibility. Previous studies of band importance in listeners with cochlear implants have used experimental maps and direct stimulation. Here, band importance was estimated for clinical maps with acoustic stimulation. Listeners with cochlear implants had band importance functions that relied more heavily on lower frequencies and showed less cross-listener consistency than in listeners with normal hearing. The intersubject variability observed here indicates that averaging band importance functions across listeners with cochlear implants, as has been done in previous studies, may not be meaningful. Additionally, band importance functions of listeners with normal hearing for vocoded speech that either did or did not simulate spread of excitation were not different from one another, suggesting that additional factors beyond spread of excitation are necessary to account for changes in band importance in listeners with cochlear implants.
Collapse
Affiliation(s)
- Adam K Bosen
- Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131, USA
| | - Monita Chatterjee
- Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131, USA
| |
Collapse
|
15
|
Mandel MI, Yoho SE, Healy EW. Measuring time-frequency importance functions of speech with bubble noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2542. [PMID: 27794278 PMCID: PMC6910024 DOI: 10.1121/1.4964102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Listeners can reliably perceive speech in noisy conditions, but it is not well understood what specific features of speech they use to do this. This paper introduces a data-driven framework to identify the time-frequency locations of these features. Using the same speech utterance mixed with many different noise instances, the framework is able to compute the importance of each time-frequency point in the utterance to its intelligibility. The mixtures have approximately the same global signal-to-noise ratio at each frequency, but very different recognition rates. The difference between these intelligible vs unintelligible mixtures is the alignment between the speech and spectro-temporally modulated noise, providing different combinations of "glimpses" of speech in each mixture. The current results reveal the locations of these important noise-robust phonetic features in a restricted set of syllables. Classification models trained to predict whether individual mixtures are intelligible based on the location of these glimpses can generalize to new conditions, successfully predicting the intelligibility of novel mixtures. They are able to generalize to novel noise instances, novel productions of the same word by the same talker, novel utterances of the same word spoken by different talkers, and, to some extent, novel consonants.
Collapse
Affiliation(s)
- Michael I Mandel
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - Sarah E Yoho
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
16
|
Humes LE, Kidd GR. Speech recognition for multiple bands: Implications for the Speech Intelligibility Index. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2019. [PMID: 27914446 PMCID: PMC6909976 DOI: 10.1121/1.4962539] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The Speech Intelligibility Index (SII) assumes additivity of the importance of acoustically independent bands of speech. To further evaluate this assumption, open-set speech recognition was measured for words and sentences, in quiet and in noise, when the speech stimuli were presented to the listener in selected frequency bands. The filter passbands were constructed from various combinations of 20 bands having equivalent (0.05) importance in the SII framework. This permitted the construction of a variety of equal-SII band patterns that were then evaluated by nine different groups of young adults with normal hearing. For monosyllabic words, a similar dependence on band pattern was observed for SII values of 0.4, 0.5, and 0.6 in both quiet and noise conditions. Specifically, band patterns concentrated toward the lower and upper frequency range tended to yield significantly lower scores than those more evenly sampling a broader frequency range. For all stimuli and test conditions, equal SII values did not yield equal performance. Because the spectral distortions of speech evaluated here may not commonly occur in everyday listening conditions, this finding does not necessarily represent a serious deficit for the application of the SII. These findings, however, challenge the band-independence assumption of the theory underlying the SII.
Collapse
Affiliation(s)
- Larry E Humes
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405-7002, USA
| | - Gary R Kidd
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405-7002, USA
| |
Collapse
|
17
|
Apoux F, Youngdahl CL, Yoho SE, Healy EW. Dual-carrier processing to convey temporal fine structure cues: Implications for cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:1469-80. [PMID: 26428784 PMCID: PMC4575322 DOI: 10.1121/1.4928136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Revised: 07/22/2015] [Accepted: 07/23/2015] [Indexed: 05/26/2023]
Abstract
Speech intelligibility in noise can be degraded by using vocoder processing to alter the temporal fine structure (TFS). Here it is argued that this degradation is not attributable to the loss of speech information potentially present in the TFS. Instead it is proposed that the degradation results from the loss of sound-source segregation information when two or more carriers (i.e., TFS) are substituted with only one as a consequence of vocoder processing. To demonstrate this segregation role, vocoder processing involving two carriers, one for the target and one for the background, was implemented. Because this approach does not preserve the speech TFS, it may be assumed that any improvement in intelligibility can only be a consequence of the preserved carrier duality and associated segregation cues. Three experiments were conducted using this "dual-carrier" approach. All experiments showed substantial sentence intelligibility in noise improvements compared to traditional single-carrier conditions. In several conditions, the improvement was so substantial that intelligibility approximated that for unprocessed speech in noise. A foreseeable and potentially promising implication for the dual-carrier approach involves implementation into cochlear implant speech processors, where it may provide the TFS cues necessary to segregate speech from noise.
Collapse
Affiliation(s)
- Frédéric Apoux
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Carla L Youngdahl
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Sarah E Yoho
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
18
|
Oosthuizen DJJ, Hanekom JJ. Fuzzy information transmission analysis for continuous speech features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:1983-1994. [PMID: 25920849 DOI: 10.1121/1.4916198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Feature information transmission analysis (FITA) estimates information transmitted by an acoustic feature by assigning tokens to categories according to the feature under investigation and comparing within-category to between-category confusions. FITA was initially developed for categorical features (e.g., voicing) for which the category assignments arise from the feature definition. When used with continuous features (e.g., formants), it may happen that pairs of tokens in different categories are more similar than pairs of tokens in the same category. The estimated transmitted information may be sensitive to category boundary location and the selected number of categories. This paper proposes a fuzzy approach to FITA that provides a smoother transition between categories and compares its sensitivity to grouping parameters with that of the traditional approach. The fuzzy FITA was found to be sufficiently robust to boundary location to allow automation of category boundary selection. Traditional and fuzzy FITA were found to be sensitive to the number of categories. This is inherent to the mechanism of isolating a feature by dividing tokens into categories, so that transmitted information values calculated using different numbers of categories should not be compared. Four categories are recommended for continuous features when twelve tokens are used.
Collapse
Affiliation(s)
- Dirk J J Oosthuizen
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, University Road, Pretoria 0002, South Africa
| | - Johan J Hanekom
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, University Road, Pretoria 0002, South Africa
| |
Collapse
|
19
|
Healy EW, Yoho SE, Apoux F. Band importance for sentences and words reexamined. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:463-73. [PMID: 23297918 PMCID: PMC3548885 DOI: 10.1121/1.4770246] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Band-importance functions were created using the "compound" technique [Apoux and Healy, J. Acoust. Soc. Am. 132, 1078-1087 (2012)] that accounts for the multitude of synergistic and redundant interactions that take place among speech bands. Functions were created for standard recordings of the speech perception in noise (SPIN) sentences and the Central Institute for the Deaf (CID) W-22 words using 21 critical-band divisions and steep filtering to eliminate the influence of filter slopes. On a given trial, a band of interest was presented along with four other bands having spectral locations determined randomly on each trial. In corresponding trials, the band of interest was absent and only the four other bands were present. The importance of the band of interest was determined by the difference between paired band-present and band-absent trials. Because the locations of the other bands changed randomly from trial to trial, various interactions occurred between the band of interest and other speech bands which provided a general estimate of band importance. Obtained band-importance functions differed substantially from those currently available for identical speech recordings. In addition to differences in the overall shape of the functions, especially for the W-22 words, a complex microstructure was observed in which the importance of adjacent frequency bands often varied considerably. This microstructure may result in better predictive power of the current functions.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA.
| | | | | |
Collapse
|