1
|
Shahidi LK, Collins LM, Mainsah BO. Objective intelligibility measurement of reverberant vocoded speech for normal-hearing listeners: Towards facilitating the development of speech enhancement algorithms for cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2151-2168. [PMID: 38501923 PMCID: PMC10959555 DOI: 10.1121/10.0025285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/24/2024] [Indexed: 03/20/2024]
Abstract
Cochlear implant (CI) recipients often struggle to understand speech in reverberant environments. Speech enhancement algorithms could restore speech perception for CI listeners by removing reverberant artifacts from the CI stimulation pattern. Listening studies, either with cochlear-implant recipients or normal-hearing (NH) listeners using a CI acoustic model, provide a benchmark for speech intelligibility improvements conferred by the enhancement algorithm but are costly and time consuming. To reduce the associated costs during algorithm development, speech intelligibility could be estimated offline using objective intelligibility measures. Previous evaluations of objective measures that considered CIs primarily assessed the combined impact of noise and reverberation and employed highly accurate enhancement algorithms. To facilitate the development of enhancement algorithms, we evaluate twelve objective measures in reverberant-only conditions characterized by a gradual reduction of reverberant artifacts, simulating the performance of an enhancement algorithm during development. Measures are validated against the performance of NH listeners using a CI acoustic model. To enhance compatibility with reverberant CI-processed signals, measure performance was assessed after modifying the reference signal and spectral filterbank. Measures leveraging the speech-to-reverberant ratio, cepstral distance and, after modifying the reference or filterbank, envelope correlation are strong predictors of intelligibility for reverberant CI-processed speech.
Collapse
Affiliation(s)
- Lidea K Shahidi
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| | - Leslie M Collins
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| | - Boyla O Mainsah
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| |
Collapse
|
2
|
Cashaback JGA, Allen JL, Chou AHY, Lin DJ, Price MA, Secerovic NK, Song S, Zhang H, Miller HL. NSF DARE-transforming modeling in neurorehabilitation: a patient-in-the-loop framework. J Neuroeng Rehabil 2024; 21:23. [PMID: 38347597 PMCID: PMC10863253 DOI: 10.1186/s12984-024-01318-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 01/25/2024] [Indexed: 02/15/2024] Open
Abstract
In 2023, the National Science Foundation (NSF) and the National Institute of Health (NIH) brought together engineers, scientists, and clinicians by sponsoring a conference on computational modelling in neurorehabiilitation. To facilitate multidisciplinary collaborations and improve patient care, in this perspective piece we identify where and how computational modelling can support neurorehabilitation. To address the where, we developed a patient-in-the-loop framework that uses multiple and/or continual measurements to update diagnostic and treatment model parameters, treatment type, and treatment prescription, with the goal of maximizing clinically-relevant functional outcomes. This patient-in-the-loop framework has several key features: (i) it includes diagnostic and treatment models, (ii) it is clinically-grounded with the International Classification of Functioning, Disability and Health (ICF) and patient involvement, (iii) it uses multiple or continual data measurements over time, and (iv) it is applicable to a range of neurological and neurodevelopmental conditions. To address the how, we identify state-of-the-art and highlight promising avenues of future research across the realms of sensorimotor adaptation, neuroplasticity, musculoskeletal, and sensory & pain computational modelling. We also discuss both the importance of and how to perform model validation, as well as challenges to overcome when implementing computational models within a clinical setting. The patient-in-the-loop approach offers a unifying framework to guide multidisciplinary collaboration between computational and clinical stakeholders in the field of neurorehabilitation.
Collapse
Affiliation(s)
- Joshua G A Cashaback
- Biomedical Engineering, Mechanical Engineering, Kinesiology and Applied Physiology, Biome chanics and Movement Science Program, Interdisciplinary Neuroscience Graduate Program, University of Delaware, 540 S College Ave, Newark, DE, 19711, USA.
| | - Jessica L Allen
- Department of Mechanical Engineering, University of Florida, Gainesville, USA
| | | | - David J Lin
- Division of Neurocritical Care and Stroke Service, Department of Neurology, Center for Neurotechnology and Neurorecovery, Massachusetts General Hospital, Harvard Medical School, Boston, USA
- Department of Veterans Affairs, Center for Neurorestoration and Neurotechnology, Rehabilitation Research and Development Service, Providence, USA
| | - Mark A Price
- Department of Mechanical and Industrial Engineering, Department of Kinesiology, University of Massachusetts Amherst, Amherst, USA
| | - Natalija K Secerovic
- School of Electrical Engineering, The Mihajlo Pupin Institute, University of Belgrade, Belgrade, Serbia
- Laboratory for Neuroengineering, Institute for Robotics and Intelligent Systems ETH Zürich, Zurich, Switzerland
| | - Seungmoon Song
- Mechanical and Industrial Engineering, Northeastern University, Boston, USA
| | - Haohan Zhang
- Department of Mechanical Engineering, University of Utah, Salt Lake City, USA
| | - Haylie L Miller
- School of Kinesiology, University of Michigan, 830 N University Ave, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
3
|
Hansen TA, O’Leary RM, Svirsky MA, Wingfield A. Self-pacing ameliorates recall deficit when listening to vocoded discourse: a cochlear implant simulation. Front Psychol 2023; 14:1225752. [PMID: 38054180 PMCID: PMC10694252 DOI: 10.3389/fpsyg.2023.1225752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 11/07/2023] [Indexed: 12/07/2023] Open
Abstract
Introduction In spite of its apparent ease, comprehension of spoken discourse represents a complex linguistic and cognitive operation. The difficulty of such an operation can increase when the speech is degraded, as is the case with cochlear implant users. However, the additional challenges imposed by degraded speech may be mitigated to some extent by the linguistic context and pace of presentation. Methods An experiment is reported in which young adults with age-normal hearing recalled discourse passages heard with clear speech or with noise-band vocoding used to simulate the sound of speech produced by a cochlear implant. Passages were varied in inter-word predictability and presented either without interruption or in a self-pacing format that allowed the listener to control the rate at which the information was delivered. Results Results showed that discourse heard with clear speech was better recalled than discourse heard with vocoded speech, discourse with a higher average inter-word predictability was better recalled than discourse with a lower average inter-word predictability, and self-paced passages were recalled better than those heard without interruption. Of special interest was the semantic hierarchy effect: the tendency for listeners to show better recall for main ideas than mid-level information or detail from a passage as an index of listeners' ability to understand the meaning of a passage. The data revealed a significant effect of inter-word predictability, in that passages with lower predictability had an attenuated semantic hierarchy effect relative to higher-predictability passages. Discussion Results are discussed in terms of broadening cochlear implant outcome measures beyond current clinical measures that focus on single-word and sentence repetition.
Collapse
Affiliation(s)
- Thomas A. Hansen
- Department of Psychology, Brandeis University, Waltham, MA, United States
| | - Ryan M. O’Leary
- Department of Psychology, Brandeis University, Waltham, MA, United States
| | - Mario A. Svirsky
- Department of Otolaryngology, NYU Langone Medical Center, New York, NY, United States
| | - Arthur Wingfield
- Department of Psychology, Brandeis University, Waltham, MA, United States
| |
Collapse
|
4
|
Effect of interaural electrode insertion depth difference and independent band selection on sentence recognition in noise and spatial release from masking in simulated bilateral cochlear implant listening. Eur Arch Otorhinolaryngol 2023; 280:3209-3217. [PMID: 36695909 DOI: 10.1007/s00405-023-07845-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 01/17/2023] [Indexed: 01/26/2023]
Abstract
PURPOSE Inter-aural insertion depth difference (IEDD) in bilateral cochlear implant (BiCI) with continuous interleaved sampling (CIS) processing is known to reduce the recognition of speech in noise and spatial release from masking (SRM). However, the independent channel selection in the 'n-of-m' sound coding strategy might have a different effect on speech recognition and SRM when compared to the effects of IEDD in CIS-based findings. This study aimed to investigate the effect of bilateral 'n-of-m' processing strategy and interaural electrode insertion depth difference on speech recognition in noise and SRM under conditions that simulated bilateral cochlear implant listening. METHODS Five young adults with normal hearing sensitivity participated in the study. The target sentences were spatially filtered to originate from 0° and the masker was spatially filtered at 0°, 15°, 37.5°, and 90° using the Oldenburg head-related transfer function database for behind the ear microphone. A 22-channel sine wave vocoder processing based on 'n-of-m' processing was applied to the spatialized target-masker mixture, in each ear. The perceptual experiment involved a test of speech recognition in noise under one co-located condition (target and masker at 0°) and three spatially separated conditions (target at 0°, masker at 15°, 37.5°, or 90° to the right ear). RESULTS The results were analyzed using a three-way repeated measure analysis of variance (ANOVA). The effect of interaural insertion depth difference (F (2,8) = 3.145, p = 0.098, ɳ2 = 0.007) and spatial separation between target and masker (F (3,12) = 1.239, p = 0.339, ɳ2 = 0.004) on speech recognition in noise was not significant. CONCLUSIONS Speech recognition in noise and SRM were not affected by IEDD ≤ 3 mm. Bilateral 'n-of-m' processing resulted in reduced speech recognition in noise and SRM.
Collapse
|
5
|
Gauer J, Nagathil A, Eckel K, Belomestny D, Martin R. A versatile deep-neural-network-based music preprocessing and remixing scheme for cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:2975. [PMID: 35649910 DOI: 10.1121/10.0010371] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 04/13/2022] [Indexed: 06/15/2023]
Abstract
While cochlear implants (CIs) have proven to restore speech perception to a remarkable extent, access to music remains difficult for most CI users. In this work, a methodology for the design of deep learning-based signal preprocessing strategies that simplify music signals and emphasize rhythmic information is proposed. It combines harmonic/percussive source separation and deep neural network (DNN) based source separation in a versatile source mixture model. Two different neural network architectures were assessed with regard to their applicability for this task. The method was evaluated with instrumental measures and in two listening experiments for both network architectures and six mixing presets. Normal-hearing subjects rated the signal quality of the processed signals compared to the original both with and without a vocoder which provides an approximation of the auditory perception in CI listeners. Four combinations of remix models and DNNs have been selected for an evaluation with vocoded signals and were all rated significantly better in comparison to the unprocessed signal. In particular, the two best-performing remix networks are promising candidates for further evaluation in CI listeners.
Collapse
Affiliation(s)
- Johannes Gauer
- Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
| | - Anil Nagathil
- Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
| | - Kai Eckel
- Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
| | - Denis Belomestny
- Faculty of Mathematics, Universität Duisburg-Essen, Essen, Germany
| | - Rainer Martin
- Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
| |
Collapse
|
6
|
Sagi E, Azadpour M, Neukam J, Capach NH, Svirsky MA. Reducing interaural tonotopic mismatch preserves binaural unmasking in cochlear implant simulations of single-sided deafness. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2316. [PMID: 34717490 PMCID: PMC8637719 DOI: 10.1121/10.0006446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 09/02/2021] [Accepted: 09/08/2021] [Indexed: 06/13/2023]
Abstract
Binaural unmasking, a key feature of normal binaural hearing, can refer to the improved intelligibility of masked speech by adding masking that facilitates perceived separation of target and masker. A question relevant for cochlear implant users with single-sided deafness (SSD-CI) is whether binaural unmasking can still be achieved if the additional masking is spectrally degraded and shifted. CIs restore some aspects of binaural hearing to these listeners, although binaural unmasking remains limited. Notably, these listeners may experience a mismatch between the frequency information perceived through the CI and that perceived by their normal hearing ear. Employing acoustic simulations of SSD-CI with normal hearing listeners, the present study confirms a previous simulation study that binaural unmasking is severely limited when interaural frequency mismatch between the input frequency range and simulated place of stimulation exceeds 1-2 mm. The present study also shows that binaural unmasking is largely retained when the input frequency range is adjusted to match simulated place of stimulation, even at the expense of removing low-frequency information. This result bears implications for the mechanisms driving the type of binaural unmasking of the present study and for mapping the frequency range of the CI speech processor in SSD-CI users.
Collapse
Affiliation(s)
- Elad Sagi
- Department of Otolaryngology-Head & Neck Surgery, New York University Grossman School of Medicine, 550 First Avenue, New York, New York 10016, USA
| | - Mahan Azadpour
- Department of Otolaryngology-Head & Neck Surgery, New York University Grossman School of Medicine, 550 First Avenue, New York, New York 10016, USA
| | - Jonathan Neukam
- Department of Otolaryngology-Head & Neck Surgery, New York University Grossman School of Medicine, 550 First Avenue, New York, New York 10016, USA
| | - Nicole Hope Capach
- Department of Otolaryngology-Head & Neck Surgery, New York University Grossman School of Medicine, 550 First Avenue, New York, New York 10016, USA
| | - Mario A Svirsky
- Department of Otolaryngology-Head & Neck Surgery, New York University Grossman School of Medicine, 550 First Avenue, New York, New York 10016, USA
| |
Collapse
|