1
|
Rennies J, Warzybok A, Kollmeier B, Brand T. Spatio-temporal Integration of Speech Reflections in Hearing-Impaired Listeners. Trends Hear 2022; 26:23312165221143901. [PMID: 36537084 PMCID: PMC9772954 DOI: 10.1177/23312165221143901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Speech recognition in rooms requires the temporal integration of reflections which arrive with a certain delay after the direct sound. It is commonly assumed that there is a certain temporal window of about 50-100 ms, during which reflections can be integrated with the direct sound, while later reflections are detrimental to speech intelligibility. This concept was challenged in a recent study by employing binaural room impulse responses (RIRs) with systematically varied interaural phase differences (IPDs) and amplitude of the direct sound and a variable number of reflections delayed by up to 200 ms. When amplitude or IPD favored late RIR components, normal-hearing (NH) listeners appeared to be capable of focusing on these components rather than on the precedent direct sound, which contrasted with the common concept of considering early RIR components as useful and late components as detrimental. The present study investigated speech intelligibility in the same conditions in hearing-impaired (HI) listeners. The data indicate that HI listeners were generally less able to "ignore" the direct sound than NH listeners, when the most useful information was confined to late RIR components. Some HI listeners showed a remarkable inability to integrate across multiple reflections and to optimally "shift" their temporal integration window, which was quite dissimilar to NH listeners. This effect was most pronounced in conditions requiring spatial and temporal integration and could provide new challenges for individual prediction models of binaural speech intelligibility.
Collapse
Affiliation(s)
- Jan Rennies
- Fraunhofer Institute for Digital Media Technology IDMT, Project Group Hearing, Speech and Audio Technology, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany,Jan Rennies, Fraunhofer Institute for Digital Media Technology IDMT, Deparment for Hearing, Speech and Audio Technology, Marie-Curie-Str. 2, Oldenburg, Niedersachsen 26129, Germany.
| | - Anna Warzybok
- Medical Physics Group, Department für Medizinische Physik und Akustik, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Birger Kollmeier
- Fraunhofer Institute for Digital Media Technology IDMT, Project Group Hearing, Speech and Audio Technology, Oldenburg, Germany,Medical Physics Group, Department für Medizinische Physik und Akustik, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Thomas Brand
- Medical Physics Group, Department für Medizinische Physik und Akustik, Oldenburg, Germany,Cluster of Excellence Hearing4all, Oldenburg, Germany
| |
Collapse
|
2
|
Rennies J, Röttges S, Huber R, Hauth CF, Brand T. A joint framework for blind prediction of binaural speech intelligibility and perceived listening effort. Hear Res 2022; 426:108598. [PMID: 35995688 DOI: 10.1016/j.heares.2022.108598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 07/15/2022] [Accepted: 08/03/2022] [Indexed: 11/16/2022]
Abstract
Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in conditions when target speech and maskers originate from different directions. Most studies and proposed models for predicting spatial unmasking have focused on speech intelligibility. The present study introduces a model framework that predicts both speech intelligibility and perceived listening effort from the same output measure. The framework is based on a combination of a blind binaural processing stage employing a blind equalization cancelation (EC) mechanism, and a blind backend based on phoneme probability classification. Neither frontend nor backend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the number of sources, allowing for a fully blind perceptual assessment of binaural input signals consisting of target speech mixed with noise. The model is validated against a recent data set in which speech intelligibility and perceived listening effort were measured for a range of acoustic conditions differing in reverberation and binaural cues [Rennies and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159]. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC stage and a backend based on the speech intelligibility index. The analyses indicated that all main trends observed in the experiments were correctly predicted by the blind model. The overall proportion of variance explained by the model (R² = 0.94) for speech intelligibility was slightly worse than for the non-blind model (R² = 0.98). For listening effort predictions, both models showed lower prediction accuracy, but still explained significant proportions of the observed variance (R² = 0.88 and R² = 0.71 for the non-blind and blind model, respectively). Closer inspection showed that the differences between data and predictions were largest for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the models, specifically by the blind version.
Collapse
Affiliation(s)
- Jan Rennies
- Fraunhofer IDMT, Hearing, Speech and Audio Technology and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany.
| | - Saskia Röttges
- Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Rainer Huber
- Fraunhofer IDMT, Hearing, Speech and Audio Technology and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany
| | - Christopher F Hauth
- Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Thomas Brand
- Department für Medizinische Physik und Akustik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence Hearing4all, Oldenburg, Germany
| |
Collapse
|
3
|
Liu Y, Quan Q. AI Recognition Method of Pronunciation Errors in Oral English Speech with the Help of Big Data for Personalized Learning. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2022. [DOI: 10.1142/s0219649222400287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
At present, there is a lack of careful consideration in the judgment process of pronunciation errors in many English speeches. These pronunciation errors will create a great impact on personalized learning. The process of creating a data set for errors is also not an easy work. On considering the above obstacle, an artificial intelligent recognition method of pronunciation errors in English speeches for personalized learning along with big data is proposed. This method takes the average pronunciation level of standard speech as the basis of pronunciation error judgment, and judges the pronunciation and application of words such as speed, pronunciation, semantics, etc. In the Hidden Markov Model (HMM) modelling method of speech recognition, Viterbi algorithm and improved posterior probability algorithm are implemented to recognize student’s vocalization instinctively. Through the segmentation and scoring of basic units, English learners are provided with reliable pronunciation information feedback, correct pronunciation errors and give corresponding feedback according to the judgment results. The innovation outcome establishes that the intelligent recognition method for personalized learning can efficiently diminish the error rate and enhance the accuracy of error detection. Let the artificial intelligence (AI) correct English learner’s pronunciation errors intelligently.
Collapse
Affiliation(s)
- Yanqing Liu
- Weinan Normal University, Weinan 714099, P. R. China
| | - Qiaoli Quan
- Weinan Normal University, Weinan 714099, P. R. China
| |
Collapse
|
4
|
Lutfi RA, Rodriguez B, Lee J. The Listener Effect in Multitalker Speech Segregation and Talker Identification. Trends Hear 2021; 25:23312165211051886. [PMID: 34693853 PMCID: PMC8544763 DOI: 10.1177/23312165211051886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Over six decades ago, Cherry (1953) drew attention to what he called the “cocktail-party problem”; the challenge of segregating the speech of one talker from others speaking at the same time. The problem has been actively researched ever since but for all this time one observation has eluded explanation. It is the wide variation in performance of individual listeners. That variation was replicated here for four major experimental factors known to impact performance: differences in task (talker segregation vs. identification), differences in the voice features of talkers (pitch vs. location), differences in the voice similarity and uncertainty of talkers (informational masking), and the presence or absence of linguistic cues. The effect of these factors on the segregation of naturally spoken sentences and synthesized vowels was largely eliminated in psychometric functions relating the performance of individual listeners to that of an ideal observer, d′ideal. The effect of listeners remained as differences in the slopes of the functions (fixed effect) with little within-listener variability in the estimates of slope (random effect). The results make a case for considering the listener a factor in multitalker segregation and identification equal in status to any major experimental variable.
Collapse
Affiliation(s)
- Robert A. Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
- Robert A. Lutfi, Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida, 33620.
| | - Briana Rodriguez
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Jungmee Lee
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| |
Collapse
|
5
|
Kubiak AM, Rennies J, Ewert SD, Kollmeier B. Relation between hearing abilities and preferred playback settings for speech perception in complex listening conditions. Int J Audiol 2021; 61:965-974. [PMID: 34612124 DOI: 10.1080/14992027.2021.1980233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
OBJECTIVE This study investigated if individual preferences with respect to the trade-off between a good signal-to-noise ratio and a distortion-free speech target were stable across different masking conditions and if simple adjustment methods could be used to identify subjects as either "noise haters" or "distortions haters". DESIGN In each masking condition, subjects could adjust the target speech level according to their preferences by employing (i) linear gain or gain at the cost of (ii) clipping distortions or (iii) compression distortions. The comparison of these processing conditions allowed investigating the preferred trade-off between distortions and noise disturbance. STUDY SAMPLE Thirty subjects differing widely in hearing status (normal-hearing to moderately impaired) and age (23-85 years). RESULTS High test-retest stability of individual preferences was found for all modification schemes. The preference adjustments suggested that subjects could be consistently categorised along a scale from "noise haters" to "distortion haters", and this preference trait remained stable through all maskers, spatial conditions, and types of distortions. CONCLUSIONS Employing quick self-adjustment to collect listening preferences in complex listening conditions revealed a stable preference trait along the "noise vs. distortions" tolerance dimension. This could potentially help in fitting modern hearing aid algorithms to the individual user.
Collapse
Affiliation(s)
- Aleksandra M Kubiak
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Jan Rennies
- Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Cluster of Excellence "Hearing4all", Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik, Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik, Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, Germany
| |
Collapse
|
6
|
Souza MRFD, Iorio MCM. Speech Intelligibility Index and the Ling 6(HL) test: correlations in pediatric hearing aid users. Codas 2021; 33:e20200094. [PMID: 34378761 DOI: 10.1590/2317-1782/20202020094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 11/24/2020] [Indexed: 11/22/2022] Open
Abstract
PURPOSE To evaluate speech audibility in schoolchildren hearing aids users and correlate the Speech Intelligibility Index to phonemes detecion. METHODS 22 children and adolescents hearing aids users, underwent audiological evaluation, in situ verification (and consequent obtaining the Speech Intelligibility Index - SII - for conditions with and without hearing aids) and detection thresholds for phonemes by Ling-6 (HL) test. RESULTS The average value for the SII was 25.1 without hearing aids and 68.9 with amplification (p <0.001 *). The phoneme detection thresholds in free field, in dBHL, were, without amplification /m/ = 29.9, /u/ = 29.5, /a/ = 35.5, /i/ = 30.8, /∫/ = 44.2 e /s/ = 44.9, and with amplification /m/ = 13.0, /u/ = 11.5 /a/ = 14.3, /i/ = 15.4, /∫/ = 20.4 e /s/ = 23.1 (p<0.001*). There was a negative correlation between SII and the thresholds of all phonemes in the condition without hearing aids (p≤0.001*) and between SII and the /s/ threshold with hearing aids (p = 0.036*). CONCLUSION The detection thresholds for all phonemes are lower than without hearing aids. There is a negative correlation between SII and the thresholds of all phonemes in the situation without hearing aids and between SII and the detection threshold of the phoneme / s / in the situation with hearing aids.
Collapse
|
7
|
Lavandier M, Mason CR, Baltzell LS, Best V. Individual differences in speech intelligibility at a cocktail party: A modeling perspective. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1076. [PMID: 34470293 PMCID: PMC8561716 DOI: 10.1121/10.0005851] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/07/2021] [Accepted: 07/21/2021] [Indexed: 06/13/2023]
Abstract
This study aimed at predicting individual differences in speech reception thresholds (SRTs) in the presence of symmetrically placed competing talkers for young listeners with sensorineural hearing loss. An existing binaural model incorporating the individual audiogram was revised to handle severe hearing losses by (a) taking as input the target speech level at SRT in a given condition and (b) introducing a floor in the model to limit extreme negative better-ear signal-to-noise ratios. The floor value was first set using SRTs measured with stationary and modulated noises. The model was then used to account for individual variations in SRTs found in two previously published data sets that used speech maskers. The model accounted well for the variation in SRTs across listeners with hearing loss, based solely on differences in audibility. When considering listeners with normal hearing, the model could predict the best SRTs, but not the poorer SRTs, suggesting that other factors limit performance when audibility (as measured with the audiogram) is not compromised.
Collapse
Affiliation(s)
- Mathieu Lavandier
- Univ. Lyon, ENTPE, Laboratoire de Tribologie et Dynamique des Systèmes UMR 5513, Rue Maurice Audin, F-69518 Vaulx-en-Velin Cedex, France
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Lucas S Baltzell
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
8
|
Gallun FJ. Impaired Binaural Hearing in Adults: A Selected Review of the Literature. Front Neurosci 2021; 15:610957. [PMID: 33815037 PMCID: PMC8017161 DOI: 10.3389/fnins.2021.610957] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 02/19/2021] [Indexed: 11/17/2022] Open
Abstract
Despite over 100 years of study, there are still many fundamental questions about binaural hearing that remain unanswered, including how impairments of binaural function are related to the mechanisms of binaural hearing. This review focuses on a number of studies that are fundamental to understanding what is known about the effects of peripheral hearing loss, aging, traumatic brain injury, strokes, brain tumors, and multiple sclerosis (MS) on binaural function. The literature reviewed makes clear that while each of these conditions has the potential to impair the binaural system, the specific abilities of a given patient cannot be known without performing multiple behavioral and/or neurophysiological measurements of binaural sensitivity. Future work in this area has the potential to bring awareness of binaural dysfunction to patients and clinicians as well as a deeper understanding of the mechanisms of binaural hearing, but it will require the integration of clinical research with animal and computational modeling approaches.
Collapse
Affiliation(s)
- Frederick J. Gallun
- Oregon Hearing Research Center, Oregon Health and Science University, Portland, OR, United States
| |
Collapse
|
9
|
Lutfi RA, Rodriguez B, Lee J, Pastore T. A test of model classes accounting for individual differences in the cocktail-party effect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:4014. [PMID: 33379927 PMCID: PMC7775115 DOI: 10.1121/10.0002961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 11/06/2020] [Accepted: 12/03/2020] [Indexed: 06/12/2023]
Abstract
Listeners differ widely in the ability to follow the speech of a single talker in a noisy crowd-what is called the cocktail-party effect. Differences may arise for any one or a combination of factors associated with auditory sensitivity, selective attention, working memory, and decision making required for effective listening. The present study attempts to narrow the possibilities by grouping explanations into model classes based on model predictions for the types of errors that distinguish better from poorer performing listeners in a vowel segregation and talker identification task. Two model classes are considered: those for which the errors are predictably tied to the voice variation of talkers (decision weight models) and those for which the errors occur largely independently of this variation (internal noise models). Regression analyses of trial-by-trial responses, for different tasks and task demands, show overwhelmingly that the latter type of error is responsible for the performance differences among listeners. The results are inconsistent with models that attribute the performance differences to differences in the reliance listeners place on relevant voice features in this decision. The results are consistent instead with models for which largely stimulus-independent, stochastic processes cause information loss at different stages of auditory processing.
Collapse
Affiliation(s)
- Robert A Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Briana Rodriguez
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Jungmee Lee
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Torben Pastore
- Spatial Hearing Lab, College of Health Solutions, Arizona State University, Tempe, Arizona 85281, USA
| |
Collapse
|