1
|
Kursun B, Shola C, Cunio IE, Langley L, Shen Y. Variability of Preference-Based Adjustments on Hearing Aid Frequency-Gain Response. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2025; 68:2006-2025. [PMID: 40036873 DOI: 10.1044/2024_jslhr-24-00215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
PURPOSE Although users can customize the frequency-gain response of hearing aids, the variability in their individual adjustments remains a concern. This study investigated the within-subject variability in the gain adjustments made within a single self-adjustment procedure. METHOD Two experiments were conducted with 20 older adults with mild-to-severe hearing loss. Participants used a two-dimensional touchscreen to adjust hearing aid amplification across six frequency bands (0.25-8 kHz) while listening to continuous speech in background noise. In these two experiments, two user interface designs, differing in control-to-gain map, were tested. For each participant, the statistical properties of 30 repeated gain adjustments within a single self-adjustment procedure were analyzed. RESULTS When participants made multiple gain adjustments, their preferred gain settings showed the highest variability in the 4- and 8-kHz frequency bands and the lowest variability in the 1- and 2-kHz bands, suggesting that midfrequency bands are weighted more heavily in their preferences compared to high frequencies. Additionally, significant correlations were observed for the preferred gains between the 0.25- and 0.5-kHz bands, between the 0.5- and 1-kHz bands, and between the 4- and 8-kHz bands. Lastly, the standard error of the preferred gain reduced with an increasing number of trials, with a rate close to being slightly shallower than would be expected for invariant mean preference for most participants, suggesting convergent estimation of the underlying preference across trials. CONCLUSION Self-adjustments of frequency-gain profiles are informative about the underlying preference; however, the contributions from various frequency bands are neither equal nor independent. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.28405397.
Collapse
Affiliation(s)
- Bertan Kursun
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Chemay Shola
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Isabella E Cunio
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Lauren Langley
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
2
|
Bosen AK, Wasiuk PA, Calandruccio L, Buss E. Frequency importance for sentence recognition in co-located noise, co-located speech, and spatially separated speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:3275-3284. [PMID: 39545745 DOI: 10.1121/10.0034412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 10/25/2024] [Indexed: 11/17/2024]
Abstract
Frequency importance functions quantify the contribution of spectral frequencies to perception. Frequency importance has been well-characterized for speech recognition in quiet and steady-state noise. However, it is currently unknown whether frequency importance estimates generalize to more complex conditions such as listening in a multi-talker masker or when targets and maskers are spatially separated. Here, frequency importance was estimated by quantifying associations between local target-to-masker ratios at the output of an auditory filterbank and keyword recognition accuracy for sentences. Unlike traditional methods used to measure frequency importance, this technique estimates frequency importance without modifying the acoustic properties of the target or masker. Frequency importance was compared across sentences in noise and a two-talker masker, as well as sentences in a two-talker masker that was either co-located with or spatially separated from the target. Results indicate that frequency importance depends on masker type and spatial configuration. Frequencies above 5 kHz had lower importance and frequencies between 600 and 1900 Hz had higher importance in the presence of a two-talker masker relative to a noise masker. Spatial separation increased the importance of frequencies between 600 Hz and 5 kHz. Thus, frequency importance functions vary across listening conditions.
Collapse
Affiliation(s)
- Adam K Bosen
- Boys Town National Research Hospital, Center for Hearing Research, Omaha, Nebraska 68131, USA
| | - Peter A Wasiuk
- Department of Communication Disorders, Southern Connecticut State University, New Haven, Connecticut 06515, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
3
|
Sun L, Ping L, Fan X, Wang J, Chen X. Simulator Verification Is Potentially Beneficial for the Fitting of Softband Bone Conduction Hearing Devices in Young Children. Otol Neurotol 2024; 45:e500-e508. [PMID: 38924037 DOI: 10.1097/mao.0000000000004245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
HYPOTHESIS The current study employed a skull-simulator verification method to assess whether the output of softband bone conduction hearing devices (BCHDs) at the manufacturer's default settings deviated widely from the target determined by the fitting formula. BACKGROUND Real ear analysis is utilized for the verification of the fitting of air conduction hearing devices (ACHDs) in a variety of institutions. This procedure, however, has not been used in the fitting of BCHDs, largely due to the difficulty of testing the output of these devices to temporal bones. Despite the availability of skull simulators, they have not been utilized clinically to measure BCHD output. MATERIALS AND METHODS This prospective, single-center study enrolled 42 subjects, aged 3 months to 10 years, with microtia-atresia-associated mild-to-severe bilateral conductive hearing loss. Hearing sensitivity was evaluated behaviorally by pure tone audiometry (PTA) in 22 subjects 4 years or older (the PTA group), and by auditory brainstem response (ABR) in 20 subjects younger than 4 years (the ABR group). Following 6 months of subjects wearing the prescribed softband BCHDs, their dial level (DL) thresholds were reassessed while using their own BCHDs, configured with zero gain across all frequencies, functioning solely as a bone vibrator. These DL thresholds were inputted into the fitting formula, desired sensation level-bone conduction devices (DSL-BCD) for children, to obtain the target values of BCHD output. The simulator output of the BCHD programmed at the manufacturer's default setting was measured in response to speech presented at 55, 65, and 80 dB SPL, followed by gain adjustment based on the differences between the simulator output and the target. Aided speech intelligibility index (SII) was measured before and after the gain adjustment. RESULTS The softband BCHDs at the manufacturer's settings generally had lower output than the prescribed target values. This difference was larger at low frequencies and low levels. Across the 12 points tested (four frequencies from 500 to 4000 Hz multiplied by three levels), 22 (52.3%) and 42 (100%) BCHDs had deviations of +7 and +5 dB, respectively, at one point or more. The gain adjustments reduced the deviation and improved the SII values at the two lower levels of speech presented. CONCLUSION The simulator output of softband bone conduction hearing devices (BCHDs) with the manufacturer's settings may exhibit significant deviations from the formula. Objective output verification should be considered a beneficial step in BCHD fitting and is recommended when applicable.
Collapse
Affiliation(s)
- Le Sun
- Department of Otolaryngology
| | - Lu Ping
- Department of General Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | | | - Jian Wang
- School of Communication Science and Disorders, Dalhousie University, Halifax, Canada
| | | |
Collapse
|
4
|
Ueda K, Doan LLD, Takeichi H. Checkerboard and interrupted speech: Intelligibility contrasts related to factor-analysis-based frequency bandsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2010-2020. [PMID: 37782122 DOI: 10.1121/10.0021165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023]
Abstract
It has been shown that the intelligibility of checkerboard speech stimuli, in which speech signals were periodically interrupted in time and frequency, drastically varied according to the combination of the number of frequency bands (2-20) and segment duration (20-320 ms). However, the effects of the number of frequency bands between 4 and 20 and the frequency division parameters on intelligibility have been largely unknown. Here, we show that speech intelligibility was lowest in four-band checkerboard speech stimuli, except for the 320-ms segment duration. Then, temporally interrupted speech stimuli and eight-band checkerboard speech stimuli came in this order (N = 19 and 20). At the same time, U-shaped intelligibility curves were observed for four-band and possibly eight-band checkerboard speech stimuli. Furthermore, different parameters of frequency division resulted in small but significant intelligibility differences at the 160- and 320-ms segment duration in four-band checkerboard speech stimuli. These results suggest that factor-analysis-based four frequency bands, representing groups of critical bands correlating with each other in speech power fluctuations, work as speech cue channels essential for speech perception. Moreover, a probability summation model for perceptual units, consisting of a sub-unit process and a supra-unit process that receives outputs of the speech cue channels, may account for the U-shaped intelligibility curves.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Linh Le Dieu Doan
- Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
5
|
Ueda K, Takeichi H, Wakamiya K. Auditory grouping is necessary to understand interrupted mosaic speech stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:970. [PMID: 36050149 PMCID: PMC9553289 DOI: 10.1121/10.0013425] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/13/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough ( ≥ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information Research and Development and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Communication Design Science, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
6
|
Buss E, Bosen A. Band importance for speech-in-speech recognition. JASA EXPRESS LETTERS 2021; 1:084402. [PMID: 34661194 PMCID: PMC8499852 DOI: 10.1121/10.0005762] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/13/2021] [Indexed: 05/04/2023]
Abstract
Predicting masked speech perception typically relies on estimates of the spectral distribution of cues supporting recognition. Current methods for estimating band importance for speech-in-noise use filtered stimuli. These methods are not appropriate for speech-in-speech because filtering can modify stimulus features affecting auditory stream segregation. Here, band importance is estimated by quantifying the relationship between speech recognition accuracy for full-spectrum speech and the target-to-masker ratio by channel at the output of an auditory filterbank. Preliminary results provide support for this approach and indicate that frequencies below 2 kHz may contribute more to speech recognition in two-talker speech than in speech-shaped noise.
Collapse
Affiliation(s)
- Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Adam Bosen
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA ,
| |
Collapse
|
7
|
Ueda K, Kawakami R, Takeichi H. Checkerboard speech vs interrupted speech: Effects of spectrotemporal segmentation on intelligibility. JASA EXPRESS LETTERS 2021; 1:075204. [PMID: 36154646 DOI: 10.1121/10.0005600] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The intelligibility of interrupted speech (interrupted over time) and checkerboard speech (interrupted over time-by-frequency), both of which retained a half of the original speech, was examined. The intelligibility of interrupted speech stimuli decreased as segment duration increased. 20-band checkerboard speech stimuli brought nearly 100% intelligibility irrespective of segment duration, whereas, with 2 and 4 frequency bands, a trough of 35%-40% appeared at the 160-ms segment duration. Mosaic speech stimuli (power was averaged over a time-frequency unit) yielded generally poor intelligibility ( ⩽10%). The results revealed the limitations of underlying auditory organization for speech cues scattered in a time-frequency domain.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Riina Kawakami
- Department of Acoustic Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Computational Engineering Applications Unit, R&D, ISC, RIKEN, 2-1 Hirosawa, Wako 351-0198, Japan , ,
| |
Collapse
|
8
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 PMCID: PMC7500957 DOI: 10.1121/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 08/27/2020] [Accepted: 08/28/2020] [Indexed: 06/11/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
9
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 DOI: 10.5041466/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
10
|
Shen Y, Yun D, Liu Y. Individualized estimation of the Speech Intelligibility Index for short sentences: Test-retest reliability. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1647. [PMID: 33003860 PMCID: PMC7511242 DOI: 10.1121/10.0001994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The speech intelligibility index (SII) model was modified to allow individualized parameters. These parameters included the relative weights of speech cues in five octave-frequency bands ranging from 0.25 to 4 kHz, i.e., the band importance function, and the transfer function that allows the SII to generate predictions on speech-recognition scores. A Bayesian adaptive procedure, the quick-band-importance-function (qBIF) procedure, was utilized to enable efficient estimation of the SII parameters from individual listeners. In two experiments, the SII parameters were estimated for 30 normal-hearing adults using Institute of Electrical and Electronics Engineers (IEEE) sentences at speech levels of 55, 65, and 75 dB sound pressure level (in Experiment I) and for 15 hearing-impaired (HI) adult listeners using amplified IEEE or AzBio sentences (in Experiment II). In both experiments, even without prior training, the estimated model parameters showed satisfactory reliability between two runs of the qBIF procedure at least one week apart. For the HI listeners, inter-listener variability in most estimated SII parameters was larger than intra-listener variability of the qBIF procedure.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, USA
| | - Donghyeon Yun
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| | - Yi Liu
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| |
Collapse
|
11
|
Du Y, Shen Y, Wu X, Chen J. The effect of speech material on the band importance function for Mandarin Chinese. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:445. [PMID: 31370645 PMCID: PMC7273514 DOI: 10.1121/1.5116691] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 05/23/2019] [Accepted: 06/25/2019] [Indexed: 05/17/2023]
Abstract
Speech material influences the relative contributions of different frequency regions to intelligibility for English. In the current study, whether a similar effect of speech material is present for Mandarin Chinese was investigated. Speech recognition was measured using three speech materials in Mandarin, including disyllabic words, nonsense sentences, and meaningful sentences. These materials differed from one another in terms of the amount of contextual information and word frequency. The band importance function (BIF), as defined under the Speech Intelligibility Index (SII) framework, was used to quantify the contributions across frequency regions. The BIFs for the three speech materials were estimated from 16 adults who were native speakers of Mandarin. A Bayesian adaptive procedure was used to efficiently estimate the octave-frequency BIFs for the three materials for each listener. As the amount of contextual information increased, low-frequency bands (e.g., 250 and 500 Hz) became more important for speech recognition, consistent with English. The BIF was flatter for Mandarin than for comparable English speech materials. Introducing the language- and material-specific BIFs to the SII model led to improved predictions of Mandarin speech-recognition performance. Results suggested the necessity of developing material-specific BIFs for Mandarin.
Collapse
Affiliation(s)
- Yufan Du
- Department of Machine Intelligence, Peking University, Beijing, China
| | - Yi Shen
- Department of Speech and Hearing Sciences, Indiana University Bloomington, 200 South Jordan Avenue, Bloomington, Indiana 47405, USA
| | - Xihong Wu
- Department of Machine Intelligence, Peking University, Beijing, China
| | - Jing Chen
- Department of Machine Intelligence, Peking University, Beijing, China
| |
Collapse
|
12
|
Shen Y, Kern AB. An Analysis of Individual Differences in Recognizing Monosyllabic Words Under the Speech Intelligibility Index Framework. Trends Hear 2019. [PMID: 29532711 PMCID: PMC5858685 DOI: 10.1177/2331216518761773] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Individual differences in the recognition of monosyllabic words, either in isolation (NU6 test) or in sentence context (SPIN test), were investigated under the theoretical framework of the speech intelligibility index (SII). An adaptive psychophysical procedure, namely the quick-band-importance-function procedure, was developed to enable the fitting of the SII model to individual listeners. Using this procedure, the band importance function (i.e., the relative weights of speech information across the spectrum) and the link function relating the SII to recognition scores can be simultaneously estimated while requiring only 200 to 300 trials of testing. Octave-frequency band importance functions and link functions were estimated separately for NU6 and SPIN materials from 30 normal-hearing listeners who were naïve to speech recognition experiments. For each type of speech material, considerable individual differences in the spectral weights were observed in some but not all frequency regions. At frequencies where the greatest intersubject variability was found, the spectral weights were correlated between the two speech materials, suggesting that the variability in spectral weights reflected listener-originated factors.
Collapse
Affiliation(s)
- Yi Shen
- 1 Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, IN, USA
| | - Allison B Kern
- 1 Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, IN, USA
| |
Collapse
|
13
|
Whitmal NA. Effects of vowel context and discriminability on band independence in nonsense syllable recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:678. [PMID: 30180683 DOI: 10.1121/1.5049375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 07/20/2018] [Indexed: 06/08/2023]
Abstract
The Speech Intelligibility Index algorithm [(1997). ANSI S3.5-1997] models cues in disjoint frequency bands for consonants and vowels as additive, independent contributions to intelligibility. Data from other studies examining only consonants in single-vowel nonsense stimuli exhibit synergetic and redundant band contributions that challenge the band independence assumption. The present study tested the hypotheses that (a) band independence is present for multi-vowel stimuli, and (b) dependent band contributions are artifacts of confounding stimulus administration and testing methods. Data were measured in two experiments in which subjects identified filtered nonsense consonant-vowel-consonant syllables using a variety of randomly selected vowels. The measured data were used in simulations that further characterized the range of subject responses. Results of testing and simulation suggest that, where present, band independence is fostered by low broadband error, high vowel diversity, and high vowel discriminability. Synergistic band contributions were observed for confusable vowels that were most susceptible to filtering; redundant contributions were observed for the least susceptible vowels. Implications for intelligibility prediction and enhancement are discussed.
Collapse
Affiliation(s)
- Nathaniel A Whitmal
- Department of Communication Disorders, University of Massachusetts, Amherst, Massachusetts 01003, USA
| |
Collapse
|
14
|
Fogerty D, Carter BL, Healy EW. Glimpsing speech in temporally and spectro-temporally modulated noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3047. [PMID: 29857753 PMCID: PMC5966311 DOI: 10.1121/1.5038266] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Speech recognition in fluctuating maskers is influenced by the spectro-temporal properties of the noise. Three experiments examined different temporal and spectro-temporal noise properties. Experiment 1 replicated previous work by highlighting maximum performance at a temporal gating rate of 4-8 Hz. Experiment 2 involved spectro-temporal glimpses. Performance was best with the largest glimpses, and performance with small glimpses approached that for continuous noise matched to the average level of the modulated noise. Better performance occurred with periodic than for random spectro-temporal glimpses. Finally, time and frequency for spectro-temporal glimpses were dissociated in experiment 3. Larger spectral glimpses were more beneficial than smaller, and minimum performance was observed at a gating rate of 4-8 Hz. The current results involving continuous speech in gated noise (slower and larger glimpses most advantageous) run counter to several results involving gated and/or filtered speech, where a larger number of smaller speech samples is often advantageous. This is because mechanisms of masking dominate, negating the advantages of better speech-information sampling. It is suggested that spectro-temporal glimpsing combines temporal glimpsing with additional processes of simultaneous masking and uncomodulation, and continuous speech in gated noise is a better model for real-world glimpsing than is gated and/or filtered speech.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1224 Sumter Street, Columbia, South Carolina 29208, USA
| | - Brittney L Carter
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|