1
|
Cychosz M, Winn MB, Goupell MJ. How to vocode: Using channel vocoders for cochlear-implant research. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2407-2437. [PMID: 38568143 PMCID: PMC10994674 DOI: 10.1121/10.0025274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 02/23/2024] [Indexed: 04/05/2024]
Abstract
The channel vocoder has become a useful tool to understand the impact of specific forms of auditory degradation-particularly the spectral and temporal degradation that reflect cochlear-implant processing. Vocoders have many parameters that allow researchers to answer questions about cochlear-implant processing in ways that overcome some logistical complications of controlling for factors in individual cochlear implant users. However, there is such a large variety in the implementation of vocoders that the term "vocoder" is not specific enough to describe the signal processing used in these experiments. Misunderstanding vocoder parameters can result in experimental confounds or unexpected stimulus distortions. This paper highlights the signal processing parameters that should be specified when describing vocoder construction. The paper also provides guidance on how to determine vocoder parameters within perception experiments, given the experimenter's goals and research questions, to avoid common signal processing mistakes. Throughout, we will assume that experimenters are interested in vocoders with the specific goal of better understanding cochlear implants.
Collapse
Affiliation(s)
- Margaret Cychosz
- Department of Linguistics, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, College Park, Maryland 20742, USA
| |
Collapse
|
2
|
Hegde M, Nazzi T, Cabrera L. An auditory perspective on phonological development in infancy. Front Psychol 2024; 14:1321311. [PMID: 38327506 PMCID: PMC10848800 DOI: 10.3389/fpsyg.2023.1321311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/11/2023] [Indexed: 02/09/2024] Open
Abstract
Introduction The auditory system encodes the phonetic features of languages by processing spectro-temporal modulations in speech, which can be described at two time scales: relatively slow amplitude variations over time (AM, further distinguished into the slowest <8-16 Hz and faster components 16-500 Hz), and frequency modulations (FM, oscillating at higher rates about 600-10 kHz). While adults require only the slowest AM cues to identify and discriminate speech sounds, infants have been shown to also require faster AM cues (>8-16 Hz) for similar tasks. Methods Using an observer-based psychophysical method, this study measured the ability of typical-hearing 6-month-olds, 10-month-olds, and adults to detect a change in the vowel or consonant features of consonant-vowel syllables when temporal modulations are selectively degraded. Two acoustically degraded conditions were designed, replacing FM cues with pure tones in 32 frequency bands, and then extracting AM cues in each frequency band with two different low-pass cut- off frequencies: (1) half the bandwidth (Fast AM condition), (2) <8 Hz (Slow AM condition). Results In the Fast AM condition, results show that with reduced FM cues, 85% of 6-month-olds, 72.5% of 10-month-olds, and 100% of adults successfully categorize phonemes. Among participants who passed the Fast AM condition, 67% of 6-month-olds, 75% of 10-month-olds, and 95% of adults passed the Slow AM condition. Furthermore, across the three age groups, the proportion of participants able to detect phonetic category change did not differ between the vowel and consonant conditions. However, age-related differences were observed for vowel categorization: while the 6- and 10-month-old groups did not differ from one another, they both independently differed from adults. Moreover, for consonant categorization, 10-month-olds were more impacted by acoustic temporal degradation compared to 6-month-olds, and showed a greater decline in detection success rates between the Fast AM and Slow AM conditions. Discussion The degradation of FM and faster AM cues (>8 Hz) appears to strongly affect consonant processing at 10 months of age. These findings suggest that between 6 and 10 months, infants show different developmental trajectories in the perceptual weight of speech temporal acoustic cues for vowel and consonant processing, possibly linked to phonological attunement.
Collapse
Affiliation(s)
- Monica Hegde
- Integrative Neuroscience and Cognition Center (INCC-UMR 8002), Université Paris Cité-CNRS, Paris, France
| | | | | |
Collapse
|
3
|
Levin M, Zaltz Y. Voice Discrimination in Quiet and in Background Noise by Simulated and Real Cochlear Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:5169-5186. [PMID: 37992412 DOI: 10.1044/2023_jslhr-23-00019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
PURPOSE Cochlear implant (CI) users demonstrate poor voice discrimination (VD) in quiet conditions based on the speaker's fundamental frequency (fo) and formant frequencies (i.e., vocal-tract length [VTL]). Our purpose was to examine the effect of background noise at levels that allow good speech recognition thresholds (SRTs) on VD via acoustic CI simulations and CI hearing. METHOD Forty-eight normal-hearing (NH) listeners who listened via noise-excited (n = 20) or sinewave (n = 28) vocoders and 10 prelingually deaf CI users (i.e., whose hearing loss began before language acquisition) participated in the study. First, the signal-to-noise ratio (SNR) that yields 70.7% correct SRT was assessed using an adaptive sentence-in-noise test. Next, the CI simulation listeners performed 12 adaptive VDs: six in quiet conditions, two with each cue (fo, VTL, fo + VTL), and six amid speech-shaped noise. The CI participants performed six VDs: one with each cue, in quiet and amid noise. SNR at VD testing was 5 dB higher than the individual's SRT in noise (SRTn +5 dB). RESULTS Results showed the following: (a) Better VD was achieved via the noise-excited than the sinewave vocoder, with the noise-excited vocoder better mimicking CI VD; (b) background noise had a limited negative effect on VD, only for the CI simulation listeners; and (c) there was a significant association between SNR at testing and VTL VD only for the CI simulation listeners. CONCLUSIONS For NH listeners who listen to CI simulations, noise that allows good SRT can nevertheless impede VD, probably because VD depends more on bottom-up sensory processing. Conversely, for prelingually deaf CI users, noise that allows good SRT hardly affects VD, suggesting that they rely strongly on bottom-up processing for both VD and speech recognition.
Collapse
Affiliation(s)
- Michal Levin
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Faculty of Medicine, Tel Aviv University, Israel
| | - Yael Zaltz
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Faculty of Medicine, Tel Aviv University, Israel
- Sagol School of Neuroscience, Tel Aviv University, Israel
| |
Collapse
|
4
|
Cychosz M, Xu K, Fu QJ. Effects of spectral smearing on speech understanding and masking release in simulated bilateral cochlear implants. PLoS One 2023; 18:e0287728. [PMID: 37917727 PMCID: PMC10621938 DOI: 10.1371/journal.pone.0287728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 06/11/2023] [Indexed: 11/04/2023] Open
Abstract
Differences in spectro-temporal degradation may explain some variability in cochlear implant users' speech outcomes. The present study employs vocoder simulations on listeners with typical hearing to evaluate how differences in degree of channel interaction across ears affects spatial speech recognition. Speech recognition thresholds and spatial release from masking were measured in 16 normal-hearing subjects listening to simulated bilateral cochlear implants. 16-channel sine-vocoded speech simulated limited, broad, or mixed channel interaction, in dichotic and diotic target-masker conditions, across ears. Thresholds were highest with broad channel interaction in both ears but improved when interaction decreased in one ear and again in both ears. Masking release was apparent across conditions. Results from this simulation study on listeners with typical hearing show that channel interaction may impact speech recognition more than masking release, and may have implications for the effects of channel interaction on cochlear implant users' speech recognition outcomes.
Collapse
Affiliation(s)
- Margaret Cychosz
- Department of Linguistics, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Kevin Xu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States of America
| |
Collapse
|
5
|
Chiossi JSC, Patou F, Ng EHN, Faulkner KF, Lyxell B. Phonological discrimination and contrast detection in pupillometry. Front Psychol 2023; 14:1232262. [PMID: 38023001 PMCID: PMC10646334 DOI: 10.3389/fpsyg.2023.1232262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 10/12/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction The perception of phonemes is guided by both low-level acoustic cues and high-level linguistic context. However, differentiating between these two types of processing can be challenging. In this study, we explore the utility of pupillometry as a tool to investigate both low- and high-level processing of phonological stimuli, with a particular focus on its ability to capture novelty detection and cognitive processing during speech perception. Methods Pupillometric traces were recorded from a sample of 22 Danish-speaking adults, with self-reported normal hearing, while performing two phonological-contrast perception tasks: a nonword discrimination task, which included minimal-pair combinations specific to the Danish language, and a nonword detection task involving the detection of phonologically modified words within sentences. The study explored the perception of contrasts in both unprocessed speech and degraded speech input, processed with a vocoder. Results No difference in peak pupil dilation was observed when the contrast occurred between two isolated nonwords in the nonword discrimination task. For unprocessed speech, higher peak pupil dilations were measured when phonologically modified words were detected within a sentence compared to sentences without the nonwords. For vocoded speech, higher peak pupil dilation was observed for sentence stimuli, but not for the isolated nonwords, although performance decreased similarly for both tasks. Conclusion Our findings demonstrate the complexity of pupil dynamics in the presence of acoustic and phonological manipulation. Pupil responses seemed to reflect higher-level cognitive and lexical processing related to phonological perception rather than low-level perception of acoustic cues. However, the incorporation of multiple talkers in the stimuli, coupled with the relatively low task complexity, may have affected the pupil dilation.
Collapse
Affiliation(s)
- Julia S. C. Chiossi
- Oticon A/S, Smørum, Denmark
- Department of Special Needs Education, University of Oslo, Oslo, Norway
| | | | - Elaine Hoi Ning Ng
- Oticon A/S, Smørum, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Swedish Institute for Disability Research, Linköping University, Linköping, Sweden
| | | | - Björn Lyxell
- Department of Special Needs Education, University of Oslo, Oslo, Norway
| |
Collapse
|
6
|
Yang J, Sidhu J, Totino G, McKim S, Xu L. Accent rating of vocoded foreign-accented speech by native listeners. JASA EXPRESS LETTERS 2023; 3:095204. [PMID: 37747319 DOI: 10.1121/10.0020989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 08/23/2023] [Indexed: 09/26/2023]
Abstract
This study examined accent rating of speech samples collected from 12 Mandarin-accented English talkers and two native English talkers. The speech samples were processed with noise- and tone-vocoders at 1, 2, 4, 8, and 16 channels. The accentedness of the vocoded and unprocessed signals was judged by 53 native English listeners on a 9-point scale. The foreign-accented talkers were judged as having a less strong accent in the vocoded conditions than in the unprocessed condition. The native talkers and foreign-accented talkers with varying degrees of accentedness demonstrated different patterns of accent rating changes as a function of the number of channels.
Collapse
Affiliation(s)
- Jing Yang
- Communication Sciences and Disorders, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201, USA
| | - Jaskirat Sidhu
- Communication Sciences and Disorders, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201, USA
| | - Gabrielle Totino
- Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701, , , , ,
| | - Sarah McKim
- Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701, , , , ,
| | - Li Xu
- Hearing, Speech and Language Sciences, Ohio University, Athens, Ohio 45701, , , , ,
| |
Collapse
|
7
|
Koupka G, Okalidou A, Nicolaidis K, Constantinidis J, Kyriafinis G, Menexes G. Voice Onset Time of Greek Stops Productions by Greek Children with Cochlear Implants and Normal Hearing. Folia Phoniatr Logop 2023; 76:109-126. [PMID: 37497950 DOI: 10.1159/000533133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 07/01/2023] [Indexed: 07/28/2023] Open
Abstract
INTRODUCTION Research on voice onset time (VOT) production of stops in children with CI versus NH has reported conflicting results. Effects of age and place of articulation on VOT have not been examined for children with CI. The purpose of this study was to examine VOT production by Greek-speaking children with CI in comparison to NH controls, with a focus on the effects of age, type of stimuli, and place of articulation. METHODS Participants were 24 children with CI aged from 2;8 to 13;3 years and 24 age- and gender-matched children with NH. Words were elicited via a picture-naming task, and nonwords were elicited via a fast mapping procedure. RESULTS For voiced stops, children with CI showed longer VOT than children with NH, whereas VOT for voiceless stops was similar to that of NH peers. Also, in both voiced and voiceless stops, the VOT differed as a function of age and place of articulation across groups. Differences as a function of stimulus type were only noted for voiced stops across groups. CONCLUSIONS For the voiced stop consonants, which demand more articulatory effort, VOT production in children with CI was longer than in children with NH. For the voiceless stop consonants, VOT production in children with CI is acquired at a young age.
Collapse
Affiliation(s)
- Georgia Koupka
- Educational and Social Policy University of Macedonia, University of Macedonia, Thessaloniki, Greece
| | - Areti Okalidou
- Educational and Social Policy University of Macedonia, University of Macedonia, Thessaloniki, Greece
| | - Katerina Nicolaidis
- Theoretical and Applied Linguistics, School of English, Aristotle University, Thessaloniki, Greece
| | - Jannis Constantinidis
- AHEPA Hospital, 1st Otorhinolaryngology Clinic of AHEPA Hospital, Thessaloniki, Greece
| | - Georgios Kyriafinis
- AHEPA Hospital, 1st Otorhinolaryngology Clinic of AHEPA Hospital, Thessaloniki, Greece
| | - George Menexes
- Faculty of Agriculture Forestry and Natural Environment, Aristotle University, Thessaloniki, Greece
| |
Collapse
|
8
|
Sinha R, Azadpour M. Employing Deep Learning Model to Evaluate Speech Information in Acoustic Simulations of Auditory Implants. RESEARCH SQUARE 2023:rs.3.rs-3085032. [PMID: 37461629 PMCID: PMC10350124 DOI: 10.21203/rs.3.rs-3085032/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Acoustic simulations have played a prominent role in the development of speech processing and sound coding strategies for auditory neural implant devices. Traditionally evaluated using human subjects, acoustic simulations have been used to model the impact of implant signal processing as well as individual anatomy/physiology on speech perception. However, human subject testing is time-consuming, costly, and subject to individual variability. In this study, we propose a novel approach to perform simulations of auditory implants. Rather than using actual human participants, we utilized an advanced deep-learning speech recognition model to simulate the effects of some important signal processing as well as psychophysical/physiological factors on speech perception. Several simulation conditions were produced by varying number of spectral bands, input frequency range, envelope cut-off frequency, envelope dynamic range and envelope quantization. Our results demonstrate that the deep-learning model exhibits human-like robustness to simulation parameters in quiet and noise, closely resembling existing human subject results. This approach is not only significantly quicker and less expensive than traditional human studies, but it also eliminates individual human variables such as attention and learning. Our findings pave the way for efficient and accurate evaluation of auditory implant simulations, aiding the future development of auditory neural prosthesis technologies.
Collapse
Affiliation(s)
- Rahul Sinha
- New York University Grossman School of Medicine
| | | |
Collapse
|
9
|
Sinha R, Azadpour M. Employing Deep Learning Model to Evaluate Speech Information in Vocoder Simulations of Auditory Implants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.23.541843. [PMID: 37292787 PMCID: PMC10245887 DOI: 10.1101/2023.05.23.541843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Vocoder simulations have played a crucial role in the development of sound coding and speech processing techniques for auditory implant devices. Vocoders have been extensively used to model the effects of implant signal processing as well as individual anatomy and physiology on speech perception of implant users. Traditionally, such simulations have been conducted on human subjects, which can be time-consuming and costly. In addition, perception of vocoded speech varies significantly across individual subjects, and can be significantly affected by small amounts of familiarization or exposure to vocoded sounds. In this study, we propose a novel method that differs from traditional vocoder studies. Rather than using actual human participants, we use a speech recognition model to examine the influence of vocoder-simulated cochlear implant processing on speech perception. We used the OpenAI Whisper, a recently developed advanced open-source deep learning speech recognition model. The Whisper model's performance was evaluated on vocoded words and sentences in both quiet and noisy conditions with respect to several vocoder parameters such as number of spectral bands, input frequency range, envelope cut-off frequency, envelope dynamic range, and number of discriminable envelope steps. Our results indicate that the Whisper model exhibited human-like robustness to vocoder simulations, with performance closely mirroring that of human subjects in response to modifications in vocoder parameters. Furthermore, this proposed method has the advantage of being far less expensive and quicker than traditional human studies, while also being free from inter-individual variability in learning abilities, cognitive factors, and attentional states. Our study demonstrates the potential of employing advanced deep learning models of speech recognition in auditory prosthesis research.
Collapse
|
10
|
de la Cruz-Pavía I, Eloy C, Perrineau-Hecklé P, Nazzi T, Cabrera L. Consonant bias in adult lexical processing under acoustically degraded listening conditions. JASA EXPRESS LETTERS 2023; 3:2892558. [PMID: 37220232 DOI: 10.1121/10.0019576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 05/05/2023] [Indexed: 05/25/2023]
Abstract
Consonants facilitate lexical processing across many languages, including French. This study investigates whether acoustic degradation affects this phonological bias in an auditory lexical decision task. French words were processed using an eight-band vocoder, degrading their frequency modulations (FM) while preserving original amplitude modulations (AM). Adult French natives were presented with these French words, preceded by similarly processed pseudoword primes sharing their vowels, consonants, or neither. Results reveal a consonant bias in the listeners' accuracy and response times, despite the reduced spectral and FM information. These degraded conditions resemble current cochlear-implant processors, and attest to the robustness of this phonological bias.
Collapse
Affiliation(s)
- Irene de la Cruz-Pavía
- Department of Linguistics and Basque Studies, Universidad del País Vasco/Euskal Herriko Unibertsitatea, Vitoria-Gasteiz 01006, Spain
| | - Coraline Eloy
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| | - Paula Perrineau-Hecklé
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| | - Thierry Nazzi
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| | - Laurianne Cabrera
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| |
Collapse
|
11
|
Alvarez F, Kipping D, Nogueira W. A computational model to simulate spectral modulation and speech perception experiments of cochlear implant users. Front Neuroinform 2023; 17:934472. [PMID: 37006637 PMCID: PMC10061543 DOI: 10.3389/fninf.2023.934472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 02/15/2023] [Indexed: 03/11/2023] Open
Abstract
Speech understanding in cochlear implant (CI) users presents large intersubject variability that may be related to different aspects of the peripheral auditory system, such as the electrode–nerve interface and neural health conditions. This variability makes it more challenging to proof differences in performance between different CI sound coding strategies in regular clinical studies, nevertheless, computational models can be helpful to assess the speech performance of CI users in an environment where all these physiological aspects can be controlled. In this study, differences in performance between three variants of the HiRes Fidelity 120 (F120) sound coding strategy are studied with a computational model. The computational model consists of (i) a processing stage with the sound coding strategy, (ii) a three-dimensional electrode-nerve interface that accounts for auditory nerve fiber (ANF) degeneration, (iii) a population of phenomenological ANF models, and (iv) a feature extractor algorithm to obtain the internal representation (IR) of the neural activity. As the back-end, the simulation framework for auditory discrimination experiments (FADE) was chosen. Two experiments relevant to speech understanding were performed: one related to spectral modulation threshold (SMT), and the other one related to speech reception threshold (SRT). These experiments included three different neural health conditions (healthy ANFs, and moderate and severe ANF degeneration). The F120 was configured to use sequential stimulation (F120-S), and simultaneous stimulation with two (F120-P) and three (F120-T) simultaneously active channels. Simultaneous stimulation causes electric interaction that smears the spectrotemporal information transmitted to the ANFs, and it has been hypothesized to lead to even worse information transmission in poor neural health conditions. In general, worse neural health conditions led to worse predicted performance; nevertheless, the detriment was small compared to clinical data. Results in SRT experiments indicated that performance with simultaneous stimulation, especially F120-T, were more affected by neural degeneration than with sequential stimulation. Results in SMT experiments showed no significant difference in performance. Although the proposed model in its current state is able to perform SMT and SRT experiments, it is not reliable to predict real CI users' performance yet. Nevertheless, improvements related to the ANF model, feature extraction, and predictor algorithm are discussed.
Collapse
Affiliation(s)
- Franklin Alvarez
- Medizinische Hochschule Hannover, Hannover, Germany
- Cluster of Excellence “Hearing4All”, Hannover, Germany
| | - Daniel Kipping
- Medizinische Hochschule Hannover, Hannover, Germany
- Cluster of Excellence “Hearing4All”, Hannover, Germany
| | - Waldo Nogueira
- Medizinische Hochschule Hannover, Hannover, Germany
- Cluster of Excellence “Hearing4All”, Hannover, Germany
- *Correspondence: Waldo Nogueira
| |
Collapse
|
12
|
Zhou N, Shi X, Dixit O, Firszt JB, Holden TA. Relationship between electrode position and temporal modulation sensitivity in cochlear implant users: Are close electrodes always better? Heliyon 2023; 9:e12467. [PMID: 36852047 PMCID: PMC9958279 DOI: 10.1016/j.heliyon.2022.e12467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 10/21/2022] [Accepted: 12/11/2022] [Indexed: 12/24/2022] Open
Abstract
Temporal modulation sensitivity has been studied extensively for cochlear implant (CI) users due to its strong correlation to speech recognition outcomes. Previous studies reported that temporal modulation detection thresholds (MDTs) vary across the tonotopic axis and attributed this variation to patchy neural survival. However, correlates of neural health identified in animal models depend on electrode position in humans. Nonetheless, the relationship between MDT and electrode location has not been explored. We tested 13 ears for the effect of distance on modulation sensitivity, specifically targeting the question of whether electrodes closer to the modiolus are universally beneficial. Participants in this study were postlingually deafened and users of Cochlear Nucleus CIs. The distance of each electrode from the medial wall (MW) of the cochlea and mid-modiolar axis (MMA) was measured from scans obtained using computerized tomography (CT) imaging. The distance measures were correlated with slopes of spatial tuning curves measured on selected electrodes to investigate if electrode position accounts, at least in part, for the width of neural excitation. In accordance with previous findings, electrode position explained 24% of the variance in slopes of the spatial tuning curves. All functioning electrodes were also measured for MDTs. Five ears showed a positive correlation between MDTs and at least one distance measure across the array; 6 ears showed negative correlations and the remaining two ears showed no relationship. The ears showing positive MDT-distance correlations, thus benefiting from electrodes being close to the neural elements, were those who performed better on the two speech recognition measures, i.e., speech reception thresholds (SRTs) and recognition of the AzBio sentences. These results could suggest that ears able to take advantage of the proximal placement of electrodes are likely to have better speech recognition outcomes. Previous histological studies of humans demonstrated that speech recognition is correlated with spiral ganglion cell counts. Alternatively, ears with good speech recognition outcomes may have good overall neural health, which is a precondition for close electrodes to produce spatially confined neural excitation patterns that facilitate modulation sensitivity. These findings suggest that the methods to reduce channel interaction, e.g., perimodiolar electrode array or current focusing, may only be beneficial for a subgroup of CI users. Additionally, it suggests that estimating neural survival preoperatively is important for choosing the most appropriate electrode array type (perimodiolar vs. lateral wall) for optimal implant function.
Collapse
Affiliation(s)
- Ning Zhou
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27834, USA
| | - Xuyang Shi
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27834, USA
| | - Omkar Dixit
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27834, USA
| | - Jill B Firszt
- Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri, 63110, USA
| | - Timothy A Holden
- Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri, 63110, USA
| |
Collapse
|
13
|
Mishra S, Dash TK, Panda G. Speech phoneme and spectral smearing based non-invasive COVID-19 detection. Front Artif Intell 2023; 5:1035805. [PMID: 36686850 PMCID: PMC9847386 DOI: 10.3389/frai.2022.1035805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 11/18/2022] [Indexed: 01/05/2023] Open
Abstract
COVID-19 is a deadly viral infection that mainly affects the nasopharyngeal and oropharyngeal cavities before the lung in the human body. Early detection followed by immediate treatment can potentially reduce lung invasion and decrease fatality. Recently, several COVID-19 detections methods have been proposed using cough and breath sounds. However, very little study has been done on the use of phoneme analysis and the smearing of the audio signal in COVID-19 detection. In this paper, this problem has been addressed and the classification of speech samples has been carried out in COVID-19-positive and healthy audio samples. Additionally, the grouping of the phonemes based on reference classification accuracies have been proposed for effectiveness and faster detection of the disease at a primary stage. The Mel and Gammatone Cepstral coefficients and their derivatives are used as the features for five standard machine learning-based classifiers. It is observed that the generalized additive model provides the highest accuracy of 97.22% for the phoneme grouping "/t//r//n//g//l/." This smearing-based phoneme classification technique can also be used in the future to classify other speech-related disease detections.
Collapse
Affiliation(s)
- Soumya Mishra
- Department of Electronics and Communication Engineering, C. V. Raman Global University, Bhubaneswar, India
| | - Tusar Kanti Dash
- Department of Electronics and Communication Engineering, C. V. Raman Global University, Bhubaneswar, India
| | - Ganapati Panda
- Department of Electronics and Communication Engineering, C. V. Raman Global University, Bhubaneswar, India
| |
Collapse
|
14
|
Muacevic A, Adler JR, Chu TSM, Chan J. The 100 Most-Cited Manuscripts in Hearing Implants: A Bibliometrics Analysis. Cureus 2023; 15:e33711. [PMID: 36793822 PMCID: PMC9925031 DOI: 10.7759/cureus.33711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/12/2023] [Indexed: 01/13/2023] Open
Abstract
The aim of the study was to characterise the most frequently cited articles on the topic of hearing implants. A systematic search was carried out using the Thomson Reuters Web of Science Core Collection database. Eligibility criteria restricted the results to primary studies and reviews published from 1970 to 2022 in English dealing primarily with hearing implants. Data including the authors, year of publication, journal, country of origin, number of citations and average number of citations per year were extracted, as well as the impact factors and five-year impact factor of journals publishing the articles. The top 100 papers were published across 23 journals and were cited 23,139 times. The most-cited and influential article describes the first use of the continuous interleaved sampling (CIS) strategy utilised in all modern cochlear implants. More than half of the studies on the list were produced by authors from the United States, and the Ear and Hearing journal had both the greatest number of articles and the greatest number of total citations. To conclude, this research serves as a guide to the most influential articles on the topic of hearing implants, although bibliometric analyses mainly focus on citations. The most-cited article was an influential description of CIS.
Collapse
|
15
|
Xie D, Luo J, Chao X, Li J, Liu X, Fan Z, Wang H, Xu L. Relationship Between the Ability to Detect Frequency Changes or Temporal Gaps and Speech Perception Performance in Post-lingual Cochlear Implant Users. Front Neurosci 2022; 16:904724. [PMID: 35757528 PMCID: PMC9213807 DOI: 10.3389/fnins.2022.904724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/03/2022] Open
Abstract
Previous studies, using modulation stimuli, on the relative effects of frequency resolution and time resolution on CI users’ speech perception failed to reach a consistent conclusion. In this study, frequency change detection and temporal gap detection were used to investigate the frequency resolution and time resolution of CI users, respectively. Psychophysical and neurophysiological methods were used to simultaneously investigate the effects of frequency and time resolution on speech perception in post-lingual cochlear implant (CI) users. We investigated the effects of psychophysical results [frequency change detection threshold (FCDT), gap detection threshold (GDT)], and acoustic change complex (ACC) responses (evoked threshold, latency, or amplitude of ACC induced by frequency change or temporal gap) on speech perception [recognition rate of monosyllabic words, disyllabic words, sentences in quiet, and sentence recognition threshold (SRT) in noise]. Thirty-one adult post-lingual CI users of Mandarin Chinese were enrolled in the study. The stimuli used to induce ACCs to frequency changes were 800-ms pure tones (fundamental frequency was 1,000 Hz); the frequency change occurred at the midpoint of the tones, with six percentages of frequency changes (0, 2, 5, 10, 20, and 50%). Temporal silences with different durations (0, 5, 10, 20, 50, and 100 ms) were inserted in the middle of the 800-ms white noise to induce ACCs evoked by temporal gaps. The FCDT and GDT were obtained by two 2-alternative forced-choice procedures. The results showed no significant correlation between the CI hearing threshold and speech perception in the study participants. In the multiple regression analysis of the influence of simultaneous psychophysical measures and ACC responses on speech perception, GDT significantly predicted every speech perception index, and the ACC amplitude evoked by the temporal gap significantly predicted the recognition of disyllabic words in quiet and SRT in noise. We conclude that when the ability to detect frequency changes and the temporal gap is considered simultaneously, the ability to detect frequency changes may have no significant effect on speech perception, but the ability to detect temporal gaps could significantly predict speech perception.
Collapse
Affiliation(s)
- Dianzhao Xie
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jianfen Luo
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Xiuhua Chao
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jinming Li
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Xianqi Liu
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhaomin Fan
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Haibo Wang
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Lei Xu
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| |
Collapse
|
16
|
Jahn KN, Arenberg JG, Horn DL. Spectral Resolution Development in Children With Normal Hearing and With Cochlear Implants: A Review of Behavioral Studies. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1646-1658. [PMID: 35201848 PMCID: PMC9499384 DOI: 10.1044/2021_jslhr-21-00307] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 09/09/2021] [Accepted: 12/01/2021] [Indexed: 06/14/2023]
Abstract
PURPOSE This review article provides a theoretical overview of the development of spectral resolution in children with normal hearing (cNH) and in those who use cochlear implants (CIs), with an emphasis on methodological considerations. The aim was to identify key directions for future research on spectral resolution development in children with CIs. METHOD A comprehensive literature review was conducted to summarize and synthesize previously published behavioral research on spectral resolution development in normal and impaired auditory systems. CONCLUSIONS In cNH, performance on spectral resolution tasks continues to improve through the teenage years and is likely driven by gradual maturation of across-channel intensity resolution. A small but growing body of evidence from children with CIs suggests a more complex relationship between spectral resolution development, patient demographics, and the quality of the CI electrode-neuron interface. Future research should aim to distinguish between the effects of patient-specific variables and the underlying physiology on spectral resolution abilities in children of all ages who are hard of hearing and use auditory prostheses.
Collapse
Affiliation(s)
- Kelly N. Jahn
- Department of Speech, Language, and Hearing, School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson
- Callier Center for Communication Disorders, The University of Texas at Dallas
| | - Julie G. Arenberg
- Department of Otolaryngology – Head and Neck Surgery, Harvard Medical School, Boston, MA
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston
| | - David L. Horn
- Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology – Head and Neck Surgery, University of Washington, Seattle
- Division of Otolaryngology, Seattle Children's Hospital, WA
| |
Collapse
|
17
|
Shader MJ, Kwon BJ, Gordon-Salant S, Goupell MJ. Open-Set Phoneme Recognition Performance With Varied Temporal Cues in Younger and Older Cochlear Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1196-1211. [PMID: 35133853 PMCID: PMC9150732 DOI: 10.1044/2021_jslhr-21-00299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 09/20/2021] [Accepted: 11/12/2021] [Indexed: 06/14/2023]
Abstract
PURPOSE The goal of this study was to investigate the effect of age on phoneme recognition performance in which the stimuli varied in the amount of temporal information available in the signal. Chronological age is increasingly recognized as a factor that can limit the amount of benefit an individual can receive from a cochlear implant (CI). Central auditory temporal processing deficits in older listeners may contribute to the performance gap between younger and older CI users on recognition of phonemes varying in temporal cues. METHOD Phoneme recognition was measured at three stimulation rates (500, 900, and 1800 pulses per second) and two envelope modulation frequencies (50 Hz and unfiltered) in 20 CI participants ranging in age from 27 to 85 years. Speech stimuli were multiple word pairs differing in temporal contrasts and were presented via direct stimulation of the electrode array using an eight-channel continuous interleaved sampling strategy. Phoneme recognition performance was evaluated at each stimulation rate condition using both envelope modulation frequencies. RESULTS Duration of deafness was the strongest subject-level predictor of phoneme recognition, with participants with longer durations of deafness having poorer performance overall. Chronological age did not predict performance for any stimulus condition. Additionally, duration of deafness interacted with envelope filtering. Participants with shorter durations of deafness were able to take advantage of higher frequency envelope modulations, while participants with longer durations of deafness were not. CONCLUSIONS Age did not significantly predict phoneme recognition performance. In contrast, longer durations of deafness were associated with a reduced ability to utilize available temporal information within the signal to improve phoneme recognition performance.
Collapse
Affiliation(s)
- Maureen J. Shader
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN
| | | | | | - Matthew J. Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| |
Collapse
|
18
|
Zheng Z, Li K, Feng G, Guo Y, Li Y, Xiao L, Liu C, He S, Zhang Z, Qian D, Feng Y. Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition. Front Neurosci 2021; 15:744959. [PMID: 34924928 PMCID: PMC8678109 DOI: 10.3389/fnins.2021.744959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 11/15/2021] [Indexed: 12/04/2022] Open
Abstract
Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin. Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach. Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43–84.82%, 76.27–95.24%, and 96.58%, respectively; for consonant recognition 35.49–63.77%, 67.75–78.87%, and 87.87%; for lexical tone recognition 60.80–97.15%, 73.16–96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14. Conclusion: Regions that contributed most for vowel recognition was Region 2 (502–1,022 Hz) that contains first formant (F1) information; Region 5 (3,856–7,562 Hz) contributed most to consonant recognition; Region 1 (80–502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.
Collapse
Affiliation(s)
- Zhong Zheng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Keyi Li
- Sydney Institute of Language and Commerce, Shanghai University, Shanghai, China
| | - Gang Feng
- Department of Graduate, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China
| | - Yang Guo
- Ear, Nose, and Throat Institute and Otorhinolaryngology Department, Eye and ENT Hospital of Fudan University, Shanghai, China
| | - Yinan Li
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Lili Xiao
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Chengqi Liu
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Shouhuan He
- Department of Otolaryngology, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Zhen Zhang
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Di Qian
- Department of Otolaryngology, Shenzhen Longhua District People's Hospital, Shenzhen, China
| | - Yanmei Feng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| |
Collapse
|
19
|
Patro C, Kreft HA, Wojtczak M. The search for correlates of age-related cochlear synaptopathy: Measures of temporal envelope processing and spatial release from speech-on-speech masking. Hear Res 2021; 409:108333. [PMID: 34425347 PMCID: PMC8424701 DOI: 10.1016/j.heares.2021.108333] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 07/17/2021] [Accepted: 08/04/2021] [Indexed: 01/13/2023]
Abstract
Older adults often experience difficulties understanding speech in adverse listening conditions. It has been suggested that for listeners with normal and near-normal audiograms, these difficulties may, at least in part, arise from age-related cochlear synaptopathy. The aim of this study was to assess if performance on auditory tasks relying on temporal envelope processing reveal age-related deficits consistent with those expected from cochlear synaptopathy. Listeners aged 20 to 66 years were tested using a series of psychophysical, electrophysiological, and speech-perception measures using stimulus configurations that promote coding by medium- and low-spontaneous-rate auditory-nerve fibers. Cognitive measures of executive function were obtained to control for age-related cognitive decline. Results from the different tests were not significantly correlated with each other despite a presumed reliance on common mechanisms involved in temporal envelope processing. Only gap-detection thresholds for a tone in noise and spatial release from speech-on-speech masking were significantly correlated with age. Increasing age was related to impaired cognitive executive function. Multivariate regression analyses showed that individual differences in hearing sensitivity, envelope-based measures, and scores from nonauditory cognitive tests did not significantly contribute to the variability in spatial release from speech-on-speech masking for small target/masker spatial separation, while age was a significant contributor.
Collapse
Affiliation(s)
- Chhayakanta Patro
- Department of Psychology, University of Minnesota, N640 Elliott Hall, 75 East River Parkway, Minneapolis, MN 55455, USA.
| | - Heather A Kreft
- Department of Psychology, University of Minnesota, N640 Elliott Hall, 75 East River Parkway, Minneapolis, MN 55455, USA
| | - Magdalena Wojtczak
- Department of Psychology, University of Minnesota, N640 Elliott Hall, 75 East River Parkway, Minneapolis, MN 55455, USA
| |
Collapse
|
20
|
Zheng Z, Li K, Guo Y, Wang X, Xiao L, Liu C, He S, Feng G, Feng Y. The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Disyllabic Word Recognition. Front Neurosci 2021; 15:670192. [PMID: 34335156 PMCID: PMC8320289 DOI: 10.3389/fnins.2021.670192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 06/14/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives Acoustic temporal envelope (E) cues containing speech information are distributed across all frequency spectra. To provide a theoretical basis for the signal coding of hearing devices, we examined the relative weight of E cues in different frequency regions for Mandarin disyllabic word recognition in quiet. Design E cues were extracted from 30 continuous frequency bands within the range of 80 to 7,562 Hz using Hilbert decomposition and assigned to five frequency regions from low to high. Disyllabic word recognition of 20 normal-hearing participants were obtained using the E cues available in two, three, or four frequency regions. The relative weights of the five frequency regions were calculated using least-squares approach. Results Participants correctly identified 3.13-38.13%, 27.50-83.13%, or 75.00-93.13% of words when presented with two, three, or four frequency regions, respectively. Increasing the number of frequency region combinations improved recognition scores and decreased the magnitude of the differences in scores between combinations. This suggested a synergistic effect among E cues from different frequency regions. The mean weights of E cues of frequency regions 1-5 were 0.31, 0.19, 0.26, 0.22, and 0.02, respectively. Conclusion For Mandarin disyllabic words, E cues of frequency regions 1 (80-502 Hz) and 3 (1,022-1,913 Hz) contributed more to word recognition than other regions, while frequency region 5 (3,856-7,562) contributed little.
Collapse
Affiliation(s)
- Zhong Zheng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Keyi Li
- Sydney Institute of Language and Commerce, Shanghai University, Shanghai, China
| | - Yang Guo
- Ear, Nose, and Throat Institute and Otorhinolaryngology Department, Eye and ENT Hospital of Fudan University, Shanghai, China
| | - Xinrong Wang
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Lili Xiao
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Chengqi Liu
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Shouhuan He
- Department of Otolaryngology, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Gang Feng
- The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China
| | - Yanmei Feng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| |
Collapse
|
21
|
Individual Variability in Recalibrating to Spectrally Shifted Speech: Implications for Cochlear Implants. Ear Hear 2021; 42:1412-1427. [PMID: 33795617 DOI: 10.1097/aud.0000000000001043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Cochlear implant (CI) recipients are at a severe disadvantage compared with normal-hearing listeners in distinguishing consonants that differ by place of articulation because the key relevant spectral differences are degraded by the implant. One component of that degradation is the upward shifting of spectral energy that occurs with a shallow insertion depth of a CI. The present study aimed to systematically measure the effects of spectral shifting on word recognition and phoneme categorization by specifically controlling the amount of shifting and using stimuli whose identification specifically depends on perceiving frequency cues. We hypothesized that listeners would be biased toward perceiving phonemes that contain higher-frequency components because of the upward frequency shift and that intelligibility would decrease as spectral shifting increased. DESIGN Normal-hearing listeners (n = 15) heard sine wave-vocoded speech with simulated upward frequency shifts of 0, 2, 4, and 6 mm of cochlear space to simulate shallow CI insertion depth. Stimuli included monosyllabic words and /b/-/d/ and /∫/-/s/ continua that varied systematically by formant frequency transitions or frication noise spectral peaks, respectively. Recalibration to spectral shifting was operationally defined as shifting perceptual acoustic-phonetic mapping commensurate with the spectral shift. In other words, adjusting frequency expectations for both phonemes upward so that there is still a perceptual distinction, rather than hearing all upward-shifted phonemes as the higher-frequency member of the pair. RESULTS For moderate amounts of spectral shifting, group data suggested a general "halfway" recalibration to spectral shifting, but individual data suggested a notably different conclusion: half of the listeners were able to recalibrate fully, while the other halves of the listeners were utterly unable to categorize shifted speech with any reliability. There were no participants who demonstrated a pattern intermediate to these two extremes. Intelligibility of words decreased with greater amounts of spectral shifting, also showing loose clusters of better- and poorer-performing listeners. Phonetic analysis of word errors revealed certain cues were more susceptible to being compromised due to a frequency shift (place and manner of articulation), while voicing was robust to spectral shifting. CONCLUSIONS Shifting the frequency spectrum of speech has systematic effects that are in line with known properties of speech acoustics, but the ensuing difficulties cannot be predicted based on tonotopic mismatch alone. Difficulties are subject to substantial individual differences in the capacity to adjust acoustic-phonetic mapping. These results help to explain why speech recognition in CI listeners cannot be fully predicted by peripheral factors like electrode placement and spectral resolution; even among listeners with functionally equivalent auditory input, there is an additional factor of simply being able or unable to flexibly adjust acoustic-phonetic mapping. This individual variability could motivate precise treatment approaches guided by an individual's relative reliance on wideband frequency representation (even if it is mismatched) or limited frequency coverage whose tonotopy is preserved.
Collapse
|
22
|
Yoon YS, Boren CM, Diaz B. Effect of Realistic Test Conditions on Spectral and Temporal Processing in Normal-Hearing Listeners. Am J Audiol 2021; 30:160-169. [PMID: 33621127 DOI: 10.1044/2020_aja-20-00120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Purpose To measure the effect of testing conditions (in the soundproof booth vs. quiet room), test order, and number of test sessions on spectral and temporal processing in normal-hearing (NH) listeners. Method Thirty-two adult NH listeners participated in the three experiments. For all three experiments, the stimuli were presented to the left ear at the subjects' most comfortable level through headphones. All tests were administered in an adaptive three-alternative forced-choice paradigm. Experiment 1 was designed to compare the effect of soundproof booth and quiet room test conditions on amplitude modulation detection threshold and modulation frequency discrimination threshold with each of the five modulation frequencies. Experiment 2 was designed to compare the effect of two test orders on the frequency discrimination thresholds under the quiet room test conditions. The thresholds were first measured in the ascending and descending order of four pure tones, and then with counterbalanced order. For Experiment 3, the amplitude discrimination threshold under the quiet room testing condition was assessed 3 times to determine the effect of the number of test sessions. Then the thresholds were compared over the sessions. Results Results showed no significant effect of test environment. The test order is an important variable for frequency discrimination, particularly between piano tunes and pure tones. Results also show no significant difference across test sessions. Conclusions These results suggest that a controlled test environment may not be required in spectral and temporal assessment for NH listeners. Under the quiet test environment, a single outcome measure is sufficient, but test orders should be counterbalanced.
Collapse
Affiliation(s)
- Yang-Soo Yoon
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| | | | - Brianna Diaz
- Department of Speech, Language and Hearing Sciences, Texas Tech University Health Sciences Center, Lubbock
| |
Collapse
|
23
|
Jahn KN, DeVries L, Arenberg JG. Recovery from forward masking in cochlear implant listeners: Effects of age and the electrode-neuron interface. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1633. [PMID: 33765782 PMCID: PMC8267874 DOI: 10.1121/10.0003623] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 02/12/2021] [Accepted: 02/12/2021] [Indexed: 06/12/2023]
Abstract
Older adults exhibit deficits in auditory temporal processing relative to younger listeners. These age-related temporal processing difficulties may be further exacerbated in older adults with cochlear implant (CIs) when CI electrodes poorly interface with their target auditory neurons. The aim of this study was to evaluate the potential interaction between chronological age and the estimated quality of the electrode-neuron interface (ENI) on psychophysical forward masking recovery, a measure that reflects single-channel temporal processing abilities. Fourteen CI listeners (age 15 to 88 years) with Advanced Bionics devices participated. Forward masking recovery was assessed on two channels in each ear (i.e., the channels with the lowest and highest signal detection thresholds). Results indicated that the rate of forward masking recovery declined with advancing age, and that the effect of age was more pronounced on channels estimated to interface poorly with the auditory nerve. These findings indicate that the quality of the ENI can influence the time course of forward masking recovery for older CI listeners. Channel-to-channel variability in the ENI likely interacts with central temporal processing deficits secondary to auditory aging, warranting further study of programming and rehabilitative approaches tailored to older listeners.
Collapse
Affiliation(s)
- Kelly N Jahn
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Lindsay DeVries
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Julie G Arenberg
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington 98105, USA
| |
Collapse
|
24
|
A Cross-Language Comparison of Sentence Recognition Using American English and Mandarin Chinese HINT and AzBio Sentences. Ear Hear 2020; 42:405-413. [PMID: 32826510 DOI: 10.1097/aud.0000000000000938] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES The aim of this study was to perform a cross-language comparison of two commonly used sentence-recognition materials (i.e., Hearing in Noise Test [HINT] and AzBio) in American English (AE) and Mandarin Chinese (MC). DESIGNS Sixty normal-hearing, native English-speaking and 60 normal-hearing, native Chinese-speaking young adults were recruited to participate in three experiments. In each experiment, the subjects were tested in their native language. In experiments I and II, noise and tone vocoders were used to process the HINT and AzBio sentences, respectively. The number of channels varied from 1 to 9, with an envelope cutoff frequency of 160 Hz. In experiment III, the AE AzBio and the MC HINT sentences were tested in speech-shaped noise at various signal to noise ratios (i.e., -20, -15, -10, -5, and 0 dB). The performance-intensity functions of sentence recognition using the two sets of sentence materials were compared. RESULTS Results of experiments I and II using vocoder processing indicated that the AE and MC versions of HINT and AzBio sentences differed in level of difficulty. The AE version yielded higher recognition performance than the MC version for both HINT and AzBio sentences. The type of vocoder processing (i.e., tone and noise vocoders) produced little differences in sentence-recognition performance in both languages. Incidentally, the AE AzBio sentences and the MC HINT sentences had similar recognition performance under vocoder processing. Such similarity was further confirmed under noise conditions in experiment III, where the performance-intensity functions of the two sets of sentences were closely matched. CONCLUSIONS The HINT and AzBio sentence materials developed in AE and MC differ in level of difficulty. The AE AzBio and the MC HINT sentence materials are similar in level of difficulty. In cross-language comparative research, the MC HINT and the AE AzBio sentences should be chosen for the respective language as the target sentence-recognition test materials.
Collapse
|
25
|
DiNino M, Arenberg JG, Duchen ALR, Winn MB. Effects of Age and Cochlear Implantation on Spectrally Cued Speech Categorization. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:2425-2440. [PMID: 32552327 PMCID: PMC7838840 DOI: 10.1044/2020_jslhr-19-00127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 08/12/2019] [Accepted: 03/30/2020] [Indexed: 06/11/2023]
Abstract
Purpose Weighting of acoustic cues for perceiving place-of-articulation speech contrasts was measured to determine the separate and interactive effects of age and use of cochlear implants (CIs). It has been found that adults with normal hearing (NH) show reliance on fine-grained spectral information (e.g., formants), whereas adults with CIs show reliance on broad spectral shape (e.g., spectral tilt). In question was whether children with NH and CIs would demonstrate the same patterns as adults, or show differences based on ongoing maturation of hearing and phonetic skills. Method Children and adults with NH and with CIs categorized a /b/-/d/ speech contrast based on two orthogonal spectral cues. Among CI users, phonetic cue weights were compared to vowel identification scores and Spectral-Temporally Modulated Ripple Test thresholds. Results NH children and adults both relied relatively more on the fine-grained formant cue and less on the broad spectral tilt cue compared to participants with CIs. However, early-implanted children with CIs better utilized the formant cue compared to adult CI users. Formant cue weights correlated with CI participants' vowel recognition and in children, also related to Spectral-Temporally Modulated Ripple Test thresholds. Adults and child CI users with very poor phonetic perception showed additive use of the two cues, whereas those with better and/or more mature cue usage showed a prioritized trading relationship, akin to NH listeners. Conclusions Age group and hearing modality can influence phonetic cue-weighting patterns. Results suggest that simple nonlexical categorization tests correlate with more general speech recognition skills of children and adults with CIs.
Collapse
Affiliation(s)
- Mishaela DiNino
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA
| | - Julie G. Arenberg
- Massachusetts Eye and Ear, Harvard Medical School Department of Otolaryngology, Boston
| | | | - Matthew B. Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis
| |
Collapse
|
26
|
Müller V, Klünter HD, Fürstenberg D, Walger M, Lang-Roth R. Comparison of the Effects of Two Cochlear Implant Fine Structure Coding Strategies on Speech Perception. Am J Audiol 2020; 29:226-235. [PMID: 32464082 DOI: 10.1044/2020_aja-19-00110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Purpose This study aims to investigate the effect of upgrading from the fine structure processing (FSP) coding strategy to the novel fine structure strategy "FS4" in adults in adults with cochlear implants manufactured by MED-EL GmbH (Innsbruck, Austria). Method A crossover, double-blinded study was conducted for 12 weeks. Twelve adult participants were randomly assigned to two groups. During the first 6-week test interval, one group continued to use their everyday FSP strategy, whereas the other group was upgraded to the FS4 strategy. In the second 6-week interval, the two groups switched coding strategies. Speech perception was measured at the end of each test interval with the Oldenburg Sentence Test and the Göttingen Sentence Test. Participants completed the Speech, Spatial and Qualities of Hearing Scale at the end of each test interval and a simple preference test at the end of the study. Results There was no significant difference in speech perception test results obtained with the Oldenburg Sentence Test and the Göttingen Sentence Test, neither in quiet nor in noise. Participants' Speech, Spatial and Qualities of Hearing Scale self-evaluation and preference test results showed that the two coding strategies had similar effects on their hearing perception. No clear preference for either of the strategies was found. Conclusions Speech perception test results and the participants' level of satisfaction were similar for the two FS coding strategies. Despite differences in the presentation of temporal fine structure between FSP and FS4, a clear benefit of the newer FS4 strategy could not be shown.
Collapse
Affiliation(s)
- Verena Müller
- Department of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Center, Faculty of Medicine, University of Cologne, Germany
| | - Heinz Dieter Klünter
- Department of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Center, Faculty of Medicine, University of Cologne, Germany
| | - Dirk Fürstenberg
- Department of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Center, Faculty of Medicine, University of Cologne, Germany
| | - Martin Walger
- Department of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Center, Faculty of Medicine, University of Cologne, Germany
| | - Ruth Lang-Roth
- Department of Otorhinolaryngology, Head and Neck Surgery and Cochlear Implant Center, Faculty of Medicine, University of Cologne, Germany
| |
Collapse
|
27
|
Ortiz-Mantilla S, Realpe-Bonilla T, Benasich AA. Early Interactive Acoustic Experience with Non-speech Generalizes to Speech and Confers a Syllabic Processing Advantage at 9 Months. Cereb Cortex 2020; 29:1789-1801. [PMID: 30722000 PMCID: PMC6418390 DOI: 10.1093/cercor/bhz001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 12/04/2018] [Accepted: 01/07/2019] [Indexed: 12/19/2022] Open
Abstract
During early development, the infant brain is highly plastic and sensory experiences modulate emerging cortical maps, enhancing processing efficiency as infants set up key linguistic precursors. Early interactive acoustic experience (IAE) with spectrotemporally-modulated non-speech has been shown to facilitate optimal acoustic processing and generalizes to novel non-speech sounds at 7-months-of-age. Here we demonstrate that effects of non-speech IAE endure well beyond the immediate training period and robustly generalize to speech processing. Infants who received non-speech IAE differed at 9-months-of-age from both naïve controls and those with only passive acoustic exposure, demonstrating broad modulation of oscillatory dynamics. For the standard syllable, increased high-gamma (>70 Hz) power within auditory cortices indicates that IAE fosters native speech processing, facilitating establishment of phonemic representations. The higher left beta power seen may reflect increased linking of sensory information and corresponding articulatory patterns, while bilateral decreases in theta power suggest more mature automatized speech processing, as less neuronal resources were allocated to process syllabic information. For the deviant syllable, left-lateralized gamma (<70 Hz) enhancement suggests IAE promotes phonemic-related discrimination abilities. Theta power increases in right auditory cortex, known for favoring slow-rate decoding, implies IAE facilitates the more demanding processing of the sporadic deviant syllable.
Collapse
Affiliation(s)
- Silvia Ortiz-Mantilla
- Center for Molecular & Behavioral Neuroscience, Rutgers University-Newark, 197 University Avenue, Newark, NJ, USA
| | - Teresa Realpe-Bonilla
- Center for Molecular & Behavioral Neuroscience, Rutgers University-Newark, 197 University Avenue, Newark, NJ, USA
| | - April A Benasich
- Center for Molecular & Behavioral Neuroscience, Rutgers University-Newark, 197 University Avenue, Newark, NJ, USA
| |
Collapse
|
28
|
Jahn KN, Arenberg JG. Polarity Sensitivity in Pediatric and Adult Cochlear Implant Listeners. Trends Hear 2020; 23:2331216519862987. [PMID: 31373266 PMCID: PMC6681263 DOI: 10.1177/2331216519862987] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Modeling data suggest that sensitivity to the polarity of an electrical stimulus
may reflect the integrity of the peripheral processes of the spiral ganglion
neurons. Specifically, better sensitivity to anodic (positive) current than to
cathodic (negative) current could indicate peripheral process degeneration or
demyelination. The goal of this study was to characterize polarity sensitivity
in pediatric and adult cochlear implant listeners (41 ears). Relationships
between polarity sensitivity at threshold and (a) polarity sensitivity at
suprathreshold levels, (b) age-group, (c) preimplantation duration of deafness,
and (d) phoneme perception were determined. Polarity sensitivity at threshold
was defined as the difference in single-channel behavioral thresholds measured
in response to each of two triphasic pulses, where the central high-amplitude
phase was either cathodic or anodic. Lower thresholds in response to anodic than
to cathodic pulses may suggest peripheral process degeneration. On the majority
of electrodes tested, threshold and suprathreshold sensitivity was lower for
anodic than for cathodic stimulation; however, dynamic range was often larger
for cathodic than for anodic stimulation. Polarity sensitivity did not differ
between child- and adult-implanted listeners. Adults with long preimplantation
durations of deafness tended to have better sensitivity to anodic pulses on
channels that were estimated to interface poorly with the auditory nerve; this
was not observed in the child-implanted group. Across subjects, duration of
deafness predicted phoneme perception performance. The results of this study
suggest that subject- and electrode-dependent differences in polarity
sensitivity may assist in developing customized cochlear implant programming
interventions for child- and adult-implanted listeners.
Collapse
Affiliation(s)
- Kelly N Jahn
- 1 Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Julie G Arenberg
- 2 Massachusetts Eye and Ear, Department of Otolaryngology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
29
|
Spectral-Temporal Trade-Off in Vocoded Sentence Recognition: Effects of Age, Hearing Thresholds, and Working Memory. Ear Hear 2020; 41:1226-1235. [PMID: 32032222 DOI: 10.1097/aud.0000000000000840] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Cochlear implant (CI) signal processing degrades the spectral components of speech. This requires CI users to rely primarily on temporal cues, specifically, amplitude modulations within the temporal envelope, to recognize speech. Auditory temporal processing ability for envelope modulations worsens with advancing age, which may put older CI users at a disadvantage compared with younger users. To evaluate how potential age-related limitations for processing temporal envelope modulations impact spectrally degraded sentence recognition, noise-vocoded sentences were presented to younger and older normal-hearing listeners in quiet. Envelope modulation rates were varied from 10 to 500 Hz by adjusting the low-pass filter cutoff frequency (LPF). The goal of this study was to evaluate if age impacts recognition of noise-vocoded speech and if this age-related limitation existed for a specific range of envelope modulation rates. DESIGN Noise-vocoded sentence recognition in quiet was measured as a function of number of spectral channels (4, 6, 8, and 12 channels) and LPF (10, 20, 50, 75, 150, 375, and 500 Hz) in 15 younger normal-hearing listeners and 15 older near-normal-hearing listeners. Hearing thresholds and working memory were assessed to determine the extent to which these factors were related to recognition of noise-vocoded sentences. RESULTS Younger listeners achieved significantly higher sentence recognition scores than older listeners overall. Performance improved in both groups as the number of spectral channels and LPF increased. As the number of spectral channels increased, the differences in sentence recognition scores between groups decreased. A spectral-temporal trade-off was observed in both groups in which performance in the 8- and 12-channel conditions plateaued with lower-frequency amplitude modulations compared with the 4- and 6-channel conditions. There was no interaction between age group and LPF, suggesting that both groups obtained similar improvements in performance with increasing LPF. The lack of an interaction between age and LPF may be due to the nature of the task of recognizing sentences in quiet. Audiometric thresholds were the only significant predictor of vocoded sentence recognition. Although performance on the working memory task declined with advancing age, working memory scores did not predict sentence recognition. CONCLUSIONS Younger listeners outperformed older listeners for recognizing noise-vocoded sentences in quiet. The negative impact of age was reduced when ample spectral information was available. Age-related limitations for recognizing vocoded sentences were not affected by the temporal envelope modulation rate of the signal, but instead, appear to be related to a generalized task limitation or to reduced audibility of the signal.
Collapse
|
30
|
Stone MA, Prendergast G, Canavan S. Measuring access to high-modulation-rate envelope speech cues in clinically fitted auditory prostheses. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1284. [PMID: 32113270 DOI: 10.1121/10.0000673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 01/15/2020] [Indexed: 06/10/2023]
Abstract
The signal processing used to increase intelligibility within the hearing-impaired listener introduces distortions in the modulation patterns of a signal. Trade-offs have to be made between improved audibility and the loss of fidelity. Acoustic hearing impairment can cause reduced access to temporal fine structure (TFS), while cochlear implant processing, used to treat profound hearing impairment, has reduced ability to convey TFS, hence forcing greater reliance on modulation cues. Target speech mixed with a competing talker was split into 8-22 frequency channels. From each channel, separate low-rate (EmodL, <16 Hz) and high-rate (EmodH, <300 Hz) versions of the envelope modulation were extracted, which resulted in low or high intelligibility, respectively. The EModL modulations were preserved in channel valleys and cross-faded to EModH in channel peaks. The cross-faded signal modulated a tone carrier in each channel. The modulated carriers were summed across channels and presented to hearing aid (HA) and cochlear implant users. Their ability to access high-rate modulation cues and the dynamic range of this access was assessed. Clinically fitted hearing aids resulted in 10% lower intelligibility than simulated high-quality aids. Encouragingly, cochlear implantees were able to extract high-rate information over a dynamic range similar to that for the HA users.
Collapse
Affiliation(s)
- Michael A Stone
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, M13 9PL, United Kingdom
| | - Garreth Prendergast
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, M13 9PL, United Kingdom
| | - Shanelle Canavan
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, M13 9PL, United Kingdom
| |
Collapse
|
31
|
Winn MB. Accommodation of gender-related phonetic differences by listeners with cochlear implants and in a variety of vocoder simulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:174. [PMID: 32006986 PMCID: PMC7341679 DOI: 10.1121/10.0000566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 12/06/2019] [Accepted: 12/13/2019] [Indexed: 06/01/2023]
Abstract
Speech perception requires accommodation of a wide range of acoustic variability across talkers. A classic example is the perception of "sh" and "s" fricative sounds, which are categorized according to spectral details of the consonant itself, and also by the context of the voice producing it. Because women's and men's voices occupy different frequency ranges, a listener is required to make a corresponding adjustment of acoustic-phonetic category space for these phonemes when hearing different talkers. This pattern is commonplace in everyday speech communication, and yet might not be captured in accuracy scores for whole words, especially when word lists are spoken by a single talker. Phonetic accommodation for fricatives "s" and "sh" was measured in 20 cochlear implant (CI) users and in a variety of vocoder simulations, including those with noise carriers with and without peak picking, simulated spread of excitation, and pulsatile carriers. CI listeners showed strong phonetic accommodation as a group. Each vocoder produced phonetic accommodation except the 8-channel noise vocoder, despite its historically good match with CI users in word intelligibility. Phonetic accommodation is largely independent of linguistic factors and thus might offer information complementary to speech intelligibility tests which are partially affected by language processing.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech & Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
32
|
Casaponsa A, Sohoglu E, Moore DR, Füllgrabe C, Molloy K, Amitay S. Does training with amplitude modulated tones affect tone-vocoded speech perception? PLoS One 2019; 14:e0226288. [PMID: 31881550 PMCID: PMC6934405 DOI: 10.1371/journal.pone.0226288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 11/22/2019] [Indexed: 11/17/2022] Open
Abstract
Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored.
Collapse
Affiliation(s)
- Aina Casaponsa
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
- Department of Linguistics and English Language, Lancaster University, Lancaster, England, United Kingdom
| | - Ediz Sohoglu
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - David R. Moore
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Christian Füllgrabe
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Katharine Molloy
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Sygal Amitay
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| |
Collapse
|
33
|
Gianakas SP, Winn MB. Lexical bias in word recognition by cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3373. [PMID: 31795696 PMCID: PMC6948217 DOI: 10.1121/1.5132938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 10/04/2019] [Accepted: 10/14/2019] [Indexed: 06/03/2023]
Abstract
When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.
Collapse
Affiliation(s)
- Steven P Gianakas
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| | - Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
34
|
Cabrera L, Liu HM, Granjon L, Kao C, Tsao FM. Discrimination and identification of lexical tones and consonants in Mandarin-speaking children using cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:2291. [PMID: 31671989 DOI: 10.1121/1.5126941] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 09/03/2019] [Indexed: 06/10/2023]
Abstract
Mandarin-speaking adults using cochlear implants (CI) experience more difficulties in perceiving lexical tones than consonants. This problem may result from the fact that CIs provide relatively sufficient temporal envelope information for consonant perception in quiet environments, but do not convey the fine spectro-temporal information considered to be necessary for accurate pitch perception. Another possibility is that Mandarin speakers with post-lingual hearing loss have developed language-specific use of these acoustic cues, impeding lexical tone processing under CI conditions. To investigate this latter hypothesis, syllable discrimination and word identification abilities for Mandarin consonants (place and manner) and lexical-tone contrasts (tones 1 vs 3 and 1 vs 2) were measured in 15 Mandarin-speaking children using CIs and age-matched children with normal hearing (NH). In the discrimination task, only children using CIs exhibited significantly lower scores for consonant place contrasts compared to other contrasts, including lexical tones. In the word identification task, children using CIs showed lower performance for all contrasts compared to children with NH, but they both showed specific difficulties with tone 1 vs 2 contrasts. This study suggests that Mandarin-speaking children using CIs are able to discriminate and identify lexical tones and, perhaps more surprisingly, have more difficulties when discriminating consonants.
Collapse
Affiliation(s)
- Laurianne Cabrera
- Integrative Neuroscience and Cognition Center, Université Paris Descartes, 45 rue des saints-pères, 75006, Paris, France
| | - Huei-Mei Liu
- Department of Special Education, National Taiwan Normal University, 162, Section 1, Heping E. Road, Taipei City 106, Taiwan
| | - Lionel Granjon
- Integrative Neuroscience and Cognition Center, Université Paris Descartes, 45 rue des saints-pères, 75006, Paris, France
| | - Chieh Kao
- Department of Psychology, National Taiwan University, Number 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Feng-Ming Tsao
- Department of Psychology, National Taiwan University, Number 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
| |
Collapse
|
35
|
Reducing Simulated Channel Interaction Reveals Differences in Phoneme Identification Between Children and Adults With Normal Hearing. Ear Hear 2019; 40:295-311. [PMID: 29927780 DOI: 10.1097/aud.0000000000000615] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Channel interaction, the stimulation of overlapping populations of auditory neurons by distinct cochlear implant (CI) channels, likely limits the speech perception performance of CI users. This study examined the role of vocoder-simulated channel interaction in the ability of children with normal hearing (cNH) and adults with normal hearing (aNH) to recognize spectrally degraded speech. The primary aim was to determine the interaction between number of processing channels and degree of simulated channel interaction on phoneme identification performance as a function of age for cNH and to relate those findings to aNH and to CI users. DESIGN Medial vowel and consonant identification of cNH (age 8-17 years) and young aNH were assessed under six (for children) or nine (for adults) different conditions of spectral degradation. Stimuli were processed using a noise-band vocoder with 8, 12, and 15 channels and synthesis filter slopes of 15 (aNH only), 30, and 60 dB/octave (all NH subjects). Steeper filter slopes (larger numbers) simulated less electrical current spread and, therefore, less channel interaction. Spectrally degraded performance of the NH listeners was also compared with the unprocessed phoneme identification of school-aged children and adults with CIs. RESULTS Spectrally degraded phoneme identification improved as a function of age for cNH. For vowel recognition, cNH exhibited an interaction between the number of processing channels and vocoder filter slope, whereas aNH did not. Specifically, for cNH, increasing the number of processing channels only improved vowel identification in the steepest filter slope condition. Additionally, cNH were more sensitive to changes in filter slope. As the filter slopes increased, cNH continued to receive vowel identification benefit beyond where aNH performance plateaued or reached ceiling. For all NH participants, consonant identification improved with increasing filter slopes but was unaffected by the number of processing channels. Although cNH made more phoneme identification errors overall, their phoneme error patterns were similar to aNH. Furthermore, consonant identification of adults with CI was comparable to aNH listening to simulations with shallow filter slopes (15 dB/octave). Vowel identification of earlier-implanted pediatric ears was better than that of later-implanted ears and more comparable to cNH listening in conditions with steep filter slopes (60 dB/octave). CONCLUSIONS Recognition of spectrally degraded phonemes improved when simulated channel interaction was reduced, particularly for children. cNH showed an interaction between number of processing channels and filter slope for vowel identification. The differences observed between cNH and aNH suggest that identification of spectrally degraded phonemes continues to improve through adolescence and that children may benefit from reduced channel interaction beyond where adult performance has plateaued. Comparison to CI users suggests that early implantation may facilitate development of better phoneme discrimination.
Collapse
|
36
|
Eipert L, Selle A, Klump GM. Uncertainty in location, level and fundamental frequency results in informational masking in a vowel discrimination task for young and elderly subjects. Hear Res 2019; 377:142-152. [DOI: 10.1016/j.heares.2019.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 03/15/2019] [Accepted: 03/18/2019] [Indexed: 10/27/2022]
|
37
|
Tamati TN, Janse E, Başkent D. Perceptual Discrimination of Speaking Style Under Cochlear Implant Simulation. Ear Hear 2019; 40:63-76. [PMID: 29742545 PMCID: PMC6319584 DOI: 10.1097/aud.0000000000000591] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/12/2018] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Real-life, adverse listening conditions involve a great deal of speech variability, including variability in speaking style. Depending on the speaking context, talkers may use a more casual, reduced speaking style or a more formal, careful speaking style. Attending to fine-grained acoustic-phonetic details characterizing different speaking styles facilitates the perception of the speaking style used by the talker. These acoustic-phonetic cues are poorly encoded in cochlear implants (CIs), potentially rendering the discrimination of speaking style difficult. As a first step to characterizing CI perception of real-life speech forms, the present study investigated the perception of different speaking styles in normal-hearing (NH) listeners with and without CI simulation. DESIGN The discrimination of three speaking styles (conversational reduced speech, speech from retold stories, and carefully read speech) was assessed using a speaking style discrimination task in two experiments. NH listeners classified sentence-length utterances, produced in one of the three styles, as either formal (careful) or informal (conversational). Utterances were presented with unmodified speaking rates in experiment 1 (31 NH, young adult Dutch speakers) and with modified speaking rates set to the average rate across all utterances in experiment 2 (28 NH, young adult Dutch speakers). In both experiments, acoustic noise-vocoder simulations of CIs were used to produce 12-channel (CI-12) and 4-channel (CI-4) vocoder simulation conditions, in addition to a no-simulation condition without CI simulation. RESULTS In both experiments 1 and 2, NH listeners were able to reliably discriminate the speaking styles without CI simulation. However, this ability was reduced under CI simulation. In experiment 1, participants showed poor discrimination of speaking styles under CI simulation. Listeners used speaking rate as a cue to make their judgements, even though it was not a reliable cue to speaking style in the study materials. In experiment 2, without differences in speaking rate among speaking styles, listeners showed better discrimination of speaking styles under CI simulation, using additional cues to complete the task. CONCLUSIONS The findings from the present study demonstrate that perceiving differences in three speaking styles under CI simulation is a difficult task because some important cues to speaking style are not fully available in these conditions. While some cues like speaking rate are available, this information alone may not always be a reliable indicator of a particular speaking style. Some other reliable speaking styles cues, such as degraded acoustic-phonetic information and variability in speaking rate within an utterance, may be available but less salient. However, as in experiment 2, listeners' perception of speaking styles may be modified if they are constrained or trained to use these additional cues, which were more reliable in the context of the present study. Taken together, these results suggest that dealing with speech variability in real-life listening conditions may be a challenge for CI users.
Collapse
Affiliation(s)
- Terrin N. Tamati
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| | - Esther Janse
- Centre for Language Studies, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
38
|
DiNino M, Arenberg JG. Age-Related Performance on Vowel Identification and the Spectral-temporally Modulated Ripple Test in Children With Normal Hearing and With Cochlear Implants. Trends Hear 2019; 22:2331216518770959. [PMID: 29708065 PMCID: PMC5949928 DOI: 10.1177/2331216518770959] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Children’s performance on psychoacoustic tasks improves with age, but inadequate auditory input may delay this maturation. Cochlear implant (CI) users receive a degraded auditory signal with reduced frequency resolution compared with normal, acoustic hearing; thus, immature auditory abilities may contribute to the variation among pediatric CI users’ speech recognition scores. This study investigated relationships between age-related variables, spectral resolution, and vowel identification scores in prelingually deafened, early-implanted children with CIs compared with normal hearing (NH) children. All participants performed vowel identification and the Spectral-temporally Modulated Ripple Test (SMRT). Vowel stimuli for NH children were vocoded to simulate the reduced spectral resolution of CI hearing. Age positively predicted NH children’s vocoded vowel identification scores, but time with the CI was a stronger predictor of vowel recognition and SMRT performance of children with CIs. For both groups, SMRT thresholds were related to vowel identification performance, analogous to previous findings in adults. Sequential information analysis of vowel feature perception indicated greater transmission of duration-related information compared with formant features in both groups of children. In addition, the amount of F2 information transmitted predicted SMRT thresholds in children with NH and with CIs. Comparisons between the two CIs of bilaterally implanted children revealed disparate task performance levels and information transmission values within the same child. These findings indicate that adequate auditory experience contributes to auditory perceptual abilities of pediatric CI users. Further, factors related to individual CIs may be more relevant to psychoacoustic task performance than are the overall capabilities of the child.
Collapse
Affiliation(s)
- Mishaela DiNino
- 1 Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Julie G Arenberg
- 1 Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
39
|
Reybrouck M, Podlipniak P. Preconceptual Spectral and Temporal Cues as a Source of Meaning in Speech and Music. Brain Sci 2019; 9:E53. [PMID: 30832292 PMCID: PMC6468545 DOI: 10.3390/brainsci9030053] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 02/18/2019] [Accepted: 02/26/2019] [Indexed: 11/24/2022] Open
Abstract
This paper explores the importance of preconceptual meaning in speech and music, stressing the role of affective vocalizations as a common ancestral instrument in communicative interactions. Speech and music are sensory rich stimuli, both at the level of production and perception, which involve different body channels, mainly the face and the voice. However, this bimodal approach has been challenged as being too restrictive. A broader conception argues for an action-oriented embodied approach that stresses the reciprocity between multisensory processing and articulatory-motor routines. There is, however, a distinction between language and music, with the latter being largely unable to function referentially. Contrary to the centrifugal tendency of language to direct the attention of the receiver away from the text or speech proper, music is centripetal in directing the listener's attention to the auditory material itself. Sound, therefore, can be considered as the meeting point between speech and music and the question can be raised as to the shared components between the interpretation of sound in the domain of speech and music. In order to answer these questions, this paper elaborates on the following topics: (i) The relationship between speech and music with a special focus on early vocalizations in humans and non-human primates; (ii) the transition from sound to meaning in speech and music; (iii) the role of emotion and affect in early sound processing; (iv) vocalizations and nonverbal affect burst in communicative sound comprehension; and (v) the acoustic features of affective sound with a special emphasis on temporal and spectrographic cues as parts of speech prosody and musical expressiveness.
Collapse
Affiliation(s)
- Mark Reybrouck
- Musicology Research Group, KU Leuven⁻University of Leuven, 3000 Leuven, Belgium and IPEM⁻Department of Musicology, Ghent University, 9000 Ghent, Belgium.
| | - Piotr Podlipniak
- Institute of Musicology, Adam Mickiewicz University in Poznań, ul. Umultowska 89D, 61-614 Poznań, Poland.
| |
Collapse
|
40
|
Stone MA, Visram A, Harte JM, Munro KJ. A Set of Time-and-Frequency-Localized Short-Duration Speech-Like Stimuli for Assessing Hearing-Aid Performance via Cortical Auditory-Evoked Potentials. Trends Hear 2019; 23:2331216519885568. [PMID: 31858885 PMCID: PMC6967206 DOI: 10.1177/2331216519885568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 08/27/2019] [Accepted: 09/23/2019] [Indexed: 11/17/2022] Open
Abstract
Short-duration speech-like stimuli, for example, excised from running speech, can be used in the clinical setting to assess the integrity of the human auditory pathway at the level of the cortex. Modeling of the cochlear response to these stimuli demonstrated an imprecision in the location of the spectrotemporal energy, giving rise to uncertainty as to what and when of a stimulus caused any evoked electrophysiological response. This article reports the development and assessment of four short-duration, limited-bandwidth stimuli centered at low, mid, mid-high, and high frequencies, suitable for free-field delivery and, in addition, reproduction via hearing aids. The durations were determined by the British Society of Audiology recommended procedure for measuring Cortical Auditory-Evoked Potentials. The levels and bandwidths were chosen via a computational model to produce uniform cochlear excitation over a width exceeding that likely in a worst-case hearing-impaired listener. These parameters produce robustness against errors in insertion gains, and variation in frequency responses, due to transducer imperfections, room modes, and age-related variation in meatal resonances. The parameter choice predicts large spectral separation between adjacent stimuli on the cochlea. Analysis of the signals processed by examples of recent digital hearing aids mostly show similar levels of gain applied to each stimulus, independent of whether the stimulus was presented in isolation, bursts, continuous, or embedded in continuous speech. These stimuli seem to be suitable for measuring hearing-aided Cortical Auditory-Evoked Potentials and have the potential to be of benefit in the clinical setting.
Collapse
Affiliation(s)
- Michael A. Stone
- Manchester Centre for Audiology and Deafness, School of Health
Sciences, University of Manchester, UK
- Manchester University Hospitals NHS Foundation Trust, UK
| | - Anisa Visram
- Manchester Centre for Audiology and Deafness, School of Health
Sciences, University of Manchester, UK
- Manchester University Hospitals NHS Foundation Trust, UK
| | - James M. Harte
- Interacoustics Research Unit, c/o Technical University of
Denmark, Lyngby, Denmark
| | - Kevin J. Munro
- Manchester Centre for Audiology and Deafness, School of Health
Sciences, University of Manchester, UK
- Manchester University Hospitals NHS Foundation Trust, UK
| |
Collapse
|
41
|
Rasetshwane DM, Raybine DA, Kopun JG, Gorga MP, Neely ST. Influence of Instantaneous Compression on Recognition of Speech in Noise with Temporal Dips. J Am Acad Audiol 2018; 30:16-30. [PMID: 30461387 DOI: 10.3766/jaaa.16165] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
BACKGROUND In listening environments with background noise that fluctuates in level, listeners with normal hearing can "glimpse" speech during dips in the noise, resulting in better speech recognition in fluctuating noise than in steady noise at the same overall level (referred to as masking release). Listeners with sensorineural hearing loss show less masking release. Amplification can improve masking release but not to the same extent that it does for listeners with normal hearing. PURPOSE The purpose of this study was to compare masking release for listeners with sensorineural hearing loss obtained with an experimental hearing-aid signal-processing algorithm with instantaneous compression (referred to as a suppression hearing aid, SHA) to masking release obtained with fast compression. The suppression hearing aid mimics effects of normal cochlear suppression, i.e., the reduction in the response to one sound by the simultaneous presentation of another sound. RESEARCH DESIGN A within-participant design with repeated measures across test conditions was used. STUDY SAMPLE Participants included 29 adults with mild-to-moderate sensorineural hearing loss and 21 adults with normal hearing. INTERVENTION Participants with sensorineural hearing loss were fitted with simulators for SHA and a generic hearing aid (GHA) with fast (but not instantaneous) compression (5 ms attack and 50 ms release times) and no suppression. Gain was prescribed using either an experimental method based on categorical loudness scaling (CLS) or the Desired Sensation Level (DSL) algorithm version 5a, resulting in a total of four processing conditions: CLS-GHA, CLS-SHA, DSL-GHA, and DSL-SHA. DATA COLLECTION All participants listened to consonant-vowel-consonant nonwords in the presence of temporally-modulated and steady noise. An adaptive-tracking procedure was used to determine the signal-to-noise ratio required to obtain 29% and 71% correct. Measurements were made with amplification for participants with sensorineural hearing loss and without amplification for participants with normal hearing. ANALYSIS Repeated-measures analysis of variance was used to determine the influence of within-participant factors of noise type and, for participants with sensorineural hearing loss, processing condition on masking release. Pearson correlational analysis was used to assess the effect of age on masking release for participants with sensorineural hearing loss. RESULTS Statistically significant masking release was observed for listeners with sensorineural hearing loss for 29% correct, but not for 71% correct. However, the amount of masking release was less than masking release for participants with normal hearing. There were no significant differences among the amplification conditions for participants with sensorineural hearing loss. CONCLUSIONS The results suggest that amplification with either instantaneous or fast compression resulted in similar masking release for listeners with sensorineural hearing loss. However, the masking release was less for participants with hearing loss than it was for those with normal hearing.
Collapse
Affiliation(s)
| | - David A Raybine
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Judy G Kopun
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Michael P Gorga
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Stephen T Neely
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
42
|
Archer-Boyd AW, Southwell RV, Deeks JM, Turner RE, Carlyon RP. Development and validation of a spectro-temporal processing test for cochlear-implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2983. [PMID: 30522311 PMCID: PMC6805218 DOI: 10.1121/1.5079636] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 11/01/2018] [Indexed: 06/06/2023]
Abstract
Psychophysical tests of spectro-temporal resolution may aid the evaluation of methods for improving hearing by cochlear implant (CI) listeners. Here the STRIPES (Spectro-Temporal Ripple for Investigating Processor EffectivenesS) test is described and validated. Like speech, the test requires both spectral and temporal processing to perform well. Listeners discriminate between complexes of sine sweeps which increase or decrease in frequency; difficulty is controlled by changing the stimulus spectro-temporal density. Care was taken to minimize extraneous cues, forcing listeners to perform the task only on the direction of the sweeps. Vocoder simulations with normal hearing listeners showed that the STRIPES test was sensitive to the number of channels and temporal information fidelity. An evaluation with CI listeners compared a standard processing strategy with one having very wide filters, thereby spectrally blurring the stimulus. Psychometric functions were monotonic for both strategies and five of six participants performed better with the standard strategy. An adaptive procedure revealed significant differences, all in favour of the standard strategy, at the individual listener level for six of eight CI listeners. Subsequent measures validated a faster version of the test, and showed that STRIPES could be performed by recently implanted listeners having no experience of psychophysical testing.
Collapse
Affiliation(s)
- Alan W. Archer-Boyd
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Rosy V. Southwell
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - John M. Deeks
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Richard E. Turner
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | - Robert P. Carlyon
- MRC Cognition & Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
43
|
Frequency specificity of amplitude envelope patterns in noise-vocoded speech. Hear Res 2018; 367:169-181. [DOI: 10.1016/j.heares.2018.06.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 06/03/2018] [Accepted: 06/08/2018] [Indexed: 11/22/2022]
|
44
|
Souza P, Wright R, Gallun F, Reinhart P. Reliability and Repeatability of the Speech Cue Profile. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:2126-2137. [PMID: 30073277 PMCID: PMC6198918 DOI: 10.1044/2018_jslhr-h-17-0341] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Revised: 01/13/2018] [Accepted: 04/08/2018] [Indexed: 05/26/2023]
Abstract
PURPOSE Researchers have long noted speech recognition variability that is not explained by the pure-tone audiogram. Previous work (Souza, Wright, Blackburn, Tatman, & Gallun, 2015) demonstrated that a small number of listeners with sensorineural hearing loss utilized different types of acoustic cues to identify speechlike stimuli, specifically the extent to which the participant relied upon spectral (or temporal) information for identification. Consistent with recent calls for data rigor and reproducibility, the primary aims of this study were to replicate the pattern of cue use in a larger cohort and to verify stability of the cue profiles over time. METHOD Cue-use profiles were measured for adults with sensorineural hearing loss using a syllable identification task consisting of synthetic speechlike stimuli in which spectral and temporal dimensions were manipulated along continua. For the first set, a static spectral shape varied from alveolar to palatal, and a temporal envelope rise time varied from affricate to fricative. For the second set, formant transitions varied from labial to alveolar and a temporal envelope rise time varied from approximant to stop. A discriminant feature analysis was used to determine to what degree spectral and temporal information contributed to stimulus identification. A subset of participants completed a 2nd visit using the same stimuli and procedures. RESULTS When spectral information was static, most participants were more influenced by spectral than by temporal information. When spectral information was dynamic, participants demonstrated a balanced distribution of cue-use patterns, with nearly equal numbers of individuals influenced by spectral or temporal cues. Individual cue profile was repeatable over a period of several months. CONCLUSION In combination with previously published data, these results indicate that listeners with sensorineural hearing loss are influenced by different cues to identify speechlike sounds and that those patterns are stable over time.
Collapse
Affiliation(s)
- Pamela Souza
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL
- Knowles Hearing Center, Northwestern University, Evanston, IL
| | - Richard Wright
- Department of Linguistics, University of Washington, Seattle
| | - Frederick Gallun
- National Center for Rehabilitative Auditory Research, Portland VA Medical Center, Oregon
- Otolaryngology–Head and Neck Surgery, Oregon Health and Science University, Portland
| | - Paul Reinhart
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL
| |
Collapse
|
45
|
Age-Related Differences in the Processing of Temporal Envelope and Spectral Cues in a Speech Segment. Ear Hear 2018; 38:e335-e342. [PMID: 28562426 DOI: 10.1097/aud.0000000000000447] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES As people age, they experience reduced temporal processing abilities. This results in poorer ability to understand speech, particularly for degraded input signals. Cochlear implants (CIs) convey speech information via the temporal envelopes of a spectrally degraded input signal. Because there is an increasing number of older CI users, there is a need to understand how temporal processing changes with age. Therefore, the goal of this study was to quantify age-related reduction in temporal processing abilities when attempting to discriminate words based on temporal envelope information from spectrally degraded signals. DESIGN Younger normal-hearing (YNH) and older normal-hearing (ONH) participants were presented a continuum of speech tokens that varied in silence duration between phonemes (0 to 60 ms in 10-ms steps), and were asked to identify whether the stimulus was perceived more as the word "dish" or "ditch." Stimuli were vocoded using tonal carriers. The number of channels (1, 2, 4, 8, 16, and unprocessed) and temporal envelope low-pass filter cutoff frequency (50 and 400 Hz) were systematically varied. RESULTS For the unprocessed conditions, the YNH participants perceived the word ditch for smaller silence durations than the ONH participants, indicating that aging affects temporal processing abilities. There was no difference in performance between the unprocessed and 16-channel, 400-Hz vocoded stimuli. Decreasing the number of spectral channels caused decreased ability to distinguish dish and ditch. Decreasing the envelope cutoff frequency also caused decreased ability to distinguish dish and ditch. The overall pattern of results revealed that reductions in spectral and temporal information had a relatively larger effect on the ONH participants compared with the YNH participants. CONCLUSIONS Aging reduces the ability to utilize brief temporal cues in speech segments. Reducing spectral information-as occurs in a channel vocoder and in CI speech processing strategies-forces participants to use temporal envelope information; however, older participants are less capable of utilizing this information. These results suggest that providing as much spectral and temporal speech information as possible would benefit older CI users relatively more than younger CI users. In addition, the present findings help set expectations of clinical outcomes for speech understanding performance by adult CI users as a function of age.
Collapse
|
46
|
Wiinberg A, Zaar J, Dau T. Effects of Expanding Envelope Fluctuations on Consonant Perception in Hearing-Impaired Listeners. Trends Hear 2018; 22:2331216518775293. [PMID: 29756553 PMCID: PMC5954573 DOI: 10.1177/2331216518775293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
This study examined the perceptual consequences of three speech enhancement schemes based on multiband nonlinear expansion of temporal envelope fluctuations between 10 and 20 Hz: (a) "idealized" envelope expansion of the speech before the addition of stationary background noise, (b) envelope expansion of the noisy speech, and (c) envelope expansion of only those time-frequency segments of the noisy speech that exhibited signal-to-noise ratios (SNRs) above -10 dB. Linear processing was considered as a reference condition. The performance was evaluated by measuring consonant recognition and consonant confusions in normal-hearing and hearing-impaired listeners using consonant-vowel nonsense syllables presented in background noise. Envelope expansion of the noisy speech showed no significant effect on the overall consonant recognition performance relative to linear processing. In contrast, SNR-based envelope expansion of the noisy speech improved the overall consonant recognition performance equivalent to a 1- to 2-dB improvement in SNR, mainly by improving the recognition of some of the stop consonants. The effect of the SNR-based envelope expansion was similar to the effect of envelope-expanding the clean speech before the addition of noise.
Collapse
Affiliation(s)
- Alan Wiinberg
- 1 Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Lyngby, Denmark
| | - Johannes Zaar
- 1 Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Lyngby, Denmark
| | - Torsten Dau
- 1 Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
47
|
Relationship Between Peripheral and Psychophysical Measures of Amplitude Modulation Detection in Cochlear Implant Users. Ear Hear 2018; 38:e268-e284. [PMID: 28207576 DOI: 10.1097/aud.0000000000000417] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVE This study investigates the relationship between electrophysiological and psychophysical measures of amplitude modulation (AM) detection. Prior studies have reported both measures of AM detection recorded separately from cochlear implant (CI) users and acutely deafened animals, but no study has made both measures in the same CI users. Animal studies suggest a progressive loss of high-frequency encoding as one ascends the auditory pathway from the auditory nerve to the cortex. Because the CI speech processor uses the envelope of an ongoing acoustic signal to modulate pulse trains that are subsequently delivered to the intracochlear electrodes, it is of interest to explore auditory nerve responses to modulated stimuli. In addition, psychophysical AM detection abilities have been correlated with speech perception outcomes. Thus, the goal was to explore how the auditory nerve responds to AM stimuli and to relate those physiologic measures to perception. DESIGN Eight patients using Cochlear Ltd. Implants participated in this study. Electrically evoked compound action potentials (ECAPs) were recorded using a 4000 pps pulse train that was sinusoidally amplitude modulated at 125, 250, 500, and 1000 Hz rates. Responses were measured for each pulse over at least one modulation cycle for an apical, medial, and basal electrode. Psychophysical modulation detection thresholds (MDTs) were also measured via a three-alternative forced choice, two-down, one-up adaptive procedure using the same modulation frequencies and electrodes. RESULTS ECAPs were recorded from individual pulses in the AM pulse train. ECAP amplitudes varied sinusoidally, reflecting the sinusoidal variation in the stimulus. A modulated response amplitude (MRA) metric was calculated as the difference in the maximal and minimum ECAP amplitudes over the modulation cycles. MRA increased as modulation frequency increased, with no apparent cutoff (up to 1000 Hz). In contrast, MDTs increased as the modulation frequency increased. This trend is inconsistent with the physiologic measures. For a fixed modulation frequency, correlations were observed between MDTs and MRAs; this trend was evident at all frequencies except 1000 Hz (although only statistically significant for 250 and 500 Hz AM rates), possibly an indication of central limitations in processing of high modulation frequencies. Finally, peripheral responses were larger and psychophysical thresholds were lower in the apical electrodes relative to basal and medial electrodes, which may reflect better cochlear health and neural survival evidenced by lower preoperative low-frequency audiometric thresholds and steeper growth of neural responses in ECAP amplitude growth functions for apical electrodes. CONCLUSIONS Robust ECAPs were recorded for all modulation frequencies tested. ECAP amplitudes varied sinusoidally, reflecting the periodicity of the modulated stimuli. MRAs increased as the modulation frequency increased, a trend we attribute to neural adaptation. For low modulation frequencies, there are multiple current steps between the peak and valley of the modulation cycle, which means successive stimuli are more similar to one another and neural responses are more likely to adapt. Higher MRAs were correlated with lower psychophysical thresholds at low modulation frequencies but not at 1000 Hz, implying a central limitation to processing of modulated stimuli.
Collapse
|
48
|
Objective Identification of Simulated Cochlear Implant Settings in Normal-Hearing Listeners Via Auditory Cortical Evoked Potentials. Ear Hear 2018; 38:e215-e226. [PMID: 28125444 DOI: 10.1097/aud.0000000000000403] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Providing cochlear implant (CI) patients the optimal signal processing settings during mapping sessions is critical for facilitating their speech perception. Here, we aimed to evaluate whether auditory cortical event-related potentials (ERPs) could be used to objectively determine optimal CI parameters. DESIGN While recording neuroelectric potentials, we presented a set of acoustically vocoded consonants (aKa, aSHa, and aNa) to normal-hearing listeners (n = 12) that simulated speech tokens processed through four different combinations of CI stimulation rate and number of spectral maxima. Parameter settings were selected to feature relatively fast/slow stimulation rates and high/low number of maxima; 1800 pps/20 maxima, 1800/8, 500/20 and 500/8. RESULTS Speech identification and reaction times did not differ with changes in either the number of maxima or stimulation rate indicating ceiling behavioral performance. Similarly, we found that conventional univariate analysis (analysis of variance) of N1 and P2 amplitude/latency failed to reveal strong modulations across CI-processed speech conditions. In contrast, multivariate discriminant analysis based on a combination of neural measures was used to create "neural confusion matrices" and identified a unique parameter set (1800/8) that maximally differentiated speech tokens at the neural level. This finding was corroborated by information transfer analysis which confirmed these settings optimally transmitted information in listeners' neural and perceptual responses. CONCLUSIONS Translated to actual implant patients, our findings suggest that scalp-recorded ERPs might be useful in determining optimal signal processing settings from among a closed set of parameter options and aid in the objective fitting of CI devices.
Collapse
|
49
|
Hawthorne K. Prosody-driven syntax learning is robust to impoverished pitch and spectral cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2756. [PMID: 29857717 DOI: 10.1121/1.5031130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Across languages, prosodic boundaries tend to align with syntactic boundaries, and both infant and adult language learners capitalize on these correlations to jump-start syntax acquisition. However, it is unclear which prosodic cues-pauses, final-syllable lengthening, and/or pitch resets across boundaries-are necessary for prosodic bootstrapping to occur. It is also unknown how syntax acquisition is impacted when listeners do not have access to the full range of prosodic or spectral information. These questions were addressed using 14-channel noise-vocoded (spectrally degraded) speech. While pre-boundary lengthening and pauses are well-transmitted through noise-vocoded speech, pitch is not; overall intelligibility is also decreased. In two artificial grammar experiments, adult native English speakers showed a similar ability to use English-like prosody to bootstrap unfamiliar syntactic structures from degraded speech and natural, unmanipulated speech. Contrary to previous findings that listeners may require pitch resets and final lengthening to co-occur if no pause cue is present, participants in the degraded speech conditions were able to detect prosodic boundaries from lengthening alone. Results suggest that pitch is not necessary for adult English speakers to perceive prosodic boundaries associated with syntactic structures, and that prosodic bootstrapping is robust to degraded spectral information.
Collapse
Affiliation(s)
- Kara Hawthorne
- Department of Communication Sciences and Disorders, University of Mississippi, 304 George Hall, University, Mississippi 38677, USA
| |
Collapse
|
50
|
Zhou N, Cadmus M, Dong L, Mathews J. Temporal Modulation Detection Depends on Sharpness of Spatial Tuning. J Assoc Res Otolaryngol 2018; 19:317-330. [PMID: 29696448 DOI: 10.1007/s10162-018-0663-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 03/22/2018] [Indexed: 01/04/2023] Open
Abstract
Prior research has shown that in electrical hearing, cochlear implant (CI) users' speech recognition performance is related in part to their ability to detect temporal modulation (i.e., modulation sensitivity). Previous studies have also shown better speech recognition when selectively stimulating sites with good modulation sensitivity rather than all stimulation sites. Site selection based on channel interaction measures, such as those using imaging or psychophysical estimates of spread of neural excitation, has also been shown to improve speech recognition. This led to the question of whether temporal modulation sensitivity and spatial selectivity of neural excitation are two related variables. In the present study, CI users' modulation sensitivity was compared for sites with relatively broad or narrow neural excitation patterns. This was achieved by measuring temporal modulation detection thresholds (MDTs) at stimulation sites that were significantly different in their sharpness of the psychophysical spatial tuning curves (PTCs) and measuring MDTs at the same sites in monopolar (MP) and bipolar (BP) stimulation modes. Nine postlingually deafened subjects implanted with Cochlear Nucleus® device took part in the study. Results showed a significant correlation between the sharpness of PTCs and MDTs, indicating that modulation detection benefits from a more spatially restricted neural activation pattern. There was a significant interaction between stimulation site and mode. That is, using BP stimulation only improved MDTs at stimulation sites with broad PTCs but had no effect or sometimes a detrimental effect on MDTs at stimulation sites with sharp PTCs. This interaction could suggest that a criterion number of nerve fibers is needed to achieve optimal temporal resolution, and, to achieve optimized speech recognition outcomes, individualized selection of site-specific current focusing strategies may be necessary. These results also suggest that the removal of stimulation sites measured with poor MDTs might improve both temporal and spectral resolution.
Collapse
Affiliation(s)
- Ning Zhou
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27858, USA.
| | - Matthew Cadmus
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27858, USA
| | - Lixue Dong
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27858, USA
| | - Juliana Mathews
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, NC, 27858, USA
| |
Collapse
|