1
|
Gao Z, Oxenham AJ. Adaptation to sentences and melodies when making judgments along a voice-nonvoice continuum. Atten Percept Psychophys 2025; 87:1022-1032. [PMID: 40000570 DOI: 10.3758/s13414-025-03030-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/05/2025] [Indexed: 02/27/2025]
Abstract
Adaptation to constant or repetitive sensory signals serves to improve detection of novel events in the environment and to encode incoming information more efficiently. Within the auditory modality, contrastive adaptation effects have been observed within a number of categories, including voice and musical instrument type. A recent study found contrastive perceptual shifts between voice and instrument categories following repetitive presentation of adaptors consisting of either vowels or instrument tones. The current study tested the generalizability of adaptation along a voice-instrument continuum, using more ecologically valid adaptors. Participants were presented with an adaptor followed by an ambiguous voice-instrument target, created by generating a 10-step morphed continuum between pairs of vowel and instrument sounds. Listeners' categorization of the target sounds was shifted contrastively by a spoken sentence or instrumental melody adaptor, regardless of whether the adaptor and the target shared the same speaker gender or similar pitch range (Experiment 1). However, no significant contrastive adaptation was observed when nonspeech vocalizations or nonpitched percussion sounds were used as the adaptors (Experiment 2). The results suggest that adaptation between voice and nonvoice categories does not rely on exact repetition of simple stimuli, nor does it solely reflect the result of a sound being categorized as being human or nonhuman sourced. The outcomes suggest future directions for determining the precise spectro-temporal properties of sounds that induce these voice-instrument contrastive adaptation effects.
Collapse
Affiliation(s)
- Zi Gao
- Department of Psychology, University of Minnesota-Twin Cities, 75 E River Rd, Minneapolis, MN, 55455, USA.
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota-Twin Cities, 75 E River Rd, Minneapolis, MN, 55455, USA
| |
Collapse
|
2
|
Neupane AK, Vanaja CS. Temporal cue based categorization and speech perception in noise among pediatric cochlear implant users. Int J Pediatr Otorhinolaryngol 2024; 187:112171. [PMID: 39571417 DOI: 10.1016/j.ijporl.2024.112171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 10/21/2024] [Accepted: 11/17/2024] [Indexed: 12/14/2024]
Abstract
OBJECTIVES Voice onset time (VOT) has been identified as a potential temporal cue for predicting children's performance in speech-in-noise tasks, yet the relationship between these two factors has never been explored among children using CI. Hence, the present study aimed to explore the performance of children using CI on temporal cue-based syllable categorization test and speech perception in noise and examine the relationship between the two. METHODS Temporal cue-based syllable categorization test was developed with the manipulation of /ba/ sound in 10 steps continuum with VOT varied between -74 ms to 26 ms. The developed test and revised speech in noise for Marathi-speaking children (0 and 5 dB SNR) were administered to thirty children with unilateral cochlear implant and thirty children with normal hearing, aged between 5 to 7 years. RESULTS The Mann-Whitney U test showed significant differences between groups in temporal cue-based categorization and speech in noise tests at 0 dB and 5 dB SNR. Kendall Tau B revealed a moderate correlation between implant age and scores on the temporal cue-based categorization and speech in noise tests at 0 dB SNR, with a strong correlation at 5 dB SNR. Additionally, there was a significant moderate relationship between temporal cue-based categorization and speech in noise test scores at both 0 dB and 5 dB SNR. CONCLUSION The present study highlights the importance of temporal cues in speech perception and the need for temporal processing for children using cochlear implants. It reinforces the evidence that speech perception skills improve with implant age.
Collapse
Affiliation(s)
- Anuj Kumar Neupane
- Bharati Vidyapeeth (Deemed to Be University), School of Audiology & Speech-Language Pathology, Pune, 411043, India.
| | - C S Vanaja
- Bharati Vidyapeeth (Deemed to Be University), School of Audiology & Speech-Language Pathology, Pune, 411043, India.
| |
Collapse
|
3
|
Farrar R, Ashjaei S, Arjmandi MK. Speech-evoked cortical activities and speech recognition in adult cochlear implant listeners: a review of functional near-infrared spectroscopy studies. Exp Brain Res 2024; 242:2509-2530. [PMID: 39305309 PMCID: PMC11527908 DOI: 10.1007/s00221-024-06921-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 09/04/2024] [Indexed: 11/01/2024]
Abstract
Cochlear implants (CIs) are the most successful neural prostheses, enabling individuals with severe to profound hearing loss to access sounds and understand speech. While CI has demonstrated success, speech perception outcomes vary largely among CI listeners, with significantly reduced performance in noise. This review paper summarizes prior findings on speech-evoked cortical activities in adult CI listeners using functional near-infrared spectroscopy (fNIRS) to understand (a) speech-evoked cortical processing in CI listeners compared to normal-hearing (NH) individuals, (b) the relationship between these activities and behavioral speech recognition scores, (c) the extent to which current fNIRS-measured speech-evoked cortical activities in CI listeners account for their differences in speech perception, and (d) challenges in using fNIRS for CI research. Compared to NH listeners, CI listeners had diminished speech-evoked activation in the middle temporal gyrus (MTG) and in the superior temporal gyrus (STG), except one study reporting an opposite pattern for STG. NH listeners exhibited higher inferior frontal gyrus (IFG) activity when listening to CI-simulated speech compared to natural speech. Among CI listeners, higher speech recognition scores correlated with lower speech-evoked activation in the STG, higher activation in the left IFG and left fusiform gyrus, with mixed findings in the MTG. fNIRS shows promise for enhancing our understanding of cortical processing of speech in CI listeners, though findings are mixed. Challenges include test-retest reliability, managing noise, replicating natural conditions, optimizing montage design, and standardizing methods to establish a strong predictive relationship between fNIRS-based cortical activities and speech perception in CI listeners.
Collapse
Affiliation(s)
- Reed Farrar
- Department of Psychology, University of South Carolina, 1512 Pendleton Street, Columbia, SC, 29208, USA
| | - Samin Ashjaei
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, SC, 29208, USA
| | - Meisam K Arjmandi
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, SC, 29208, USA.
- Institute for Mind and Brain, University of South Carolina, Barnwell Street, Columbia, SC, 29208, USA.
| |
Collapse
|
4
|
Ashjaei S, Behroozmand R, Fozdar S, Farrar R, Arjmandi M. Vocal control and speech production in cochlear implant listeners: A review within auditory-motor processing framework. Hear Res 2024; 453:109132. [PMID: 39447319 DOI: 10.1016/j.heares.2024.109132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 10/11/2024] [Accepted: 10/14/2024] [Indexed: 10/26/2024]
Abstract
A comprehensive literature review is conducted to summarize and discuss prior findings on how cochlear implants (CI) affect the users' abilities to produce and control vocal and articulatory movements within the auditory-motor integration framework of speech. Patterns of speech production pre- versus post-implantation, post-implantation adjustments, deviations from the typical ranges of speakers with normal hearing (NH), the effects of switching the CI on and off, as well as the impact of altered auditory feedback on vocal and articulatory speech control are discussed. Overall, findings indicate that CIs enhance the vocal and articulatory control aspects of speech production at both segmental and suprasegmental levels. While many CI users achieve speech quality comparable to NH individuals, some features still deviate in a group of CI users even years post-implantation. More specifically, contracted vowel space, increased vocal jitter and shimmer, longer phoneme and utterance durations, shorter voice onset time, decreased contrast in fricative production, limited prosodic patterns, and reduced intelligibility have been reported in subgroups of CI users compared to NH individuals. Significant individual variations among CI users have been observed in both the pace of speech production adjustments and long-term speech outcomes. Few controlled studies have explored how the implantation age and the duration of CI use influence speech features, leaving substantial gaps in our understanding about the effects of spectral resolution, auditory rehabilitation, and individual auditory-motor processing abilities on vocal and articulatory speech outcomes in CI users. Future studies under the auditory-motor integration framework are warranted to determine how suboptimal CI auditory feedback impacts auditory-motor processing and precise vocal and articulatory control in CI users.
Collapse
Affiliation(s)
- Samin Ashjaei
- Translational Auditory Neuroscience Lab, Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, 1705 College Street, Columbia, SC 29208, USA
| | - Roozbeh Behroozmand
- Speech Neuroscience Lab, Department of Speech, Language, and Hearing, Callier Center for Communication Disorders, School of Behavioral and Brain Sciences, The University of Texas at Dallas, 2811 North Floyd Road, Richardson, TX 75080, USA
| | - Shaivee Fozdar
- Translational Auditory Neuroscience Lab, Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, 1705 College Street, Columbia, SC 29208, USA
| | - Reed Farrar
- Translational Auditory Neuroscience Lab, Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, 1705 College Street, Columbia, SC 29208, USA
| | - Meisam Arjmandi
- Translational Auditory Neuroscience Lab, Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, 1705 College Street, Columbia, SC 29208, USA; Institute for Mind and Brain, University of South Carolina, Barnwell Street, Columbia, SC 29208, USA.
| |
Collapse
|
5
|
DiNino M. Age and masking effects on acoustic cues for vowel categorizationa). JASA EXPRESS LETTERS 2024; 4:060001. [PMID: 38884558 DOI: 10.1121/10.0026371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 05/25/2024] [Indexed: 06/18/2024]
Abstract
Age-related changes in auditory processing may reduce physiological coding of acoustic cues, contributing to older adults' difficulty perceiving speech in background noise. This study investigated whether older adults differed from young adults in patterns of acoustic cue weighting for categorizing vowels in quiet and in noise. All participants relied primarily on spectral quality to categorize /ɛ/ and /æ/ sounds under both listening conditions. However, relative to young adults, older adults exhibited greater reliance on duration and less reliance on spectral quality. These results suggest that aging alters patterns of perceptual cue weights that may influence speech recognition abilities.
Collapse
|
6
|
Hartman J, Saffran J, Litovsky R. Word Learning in Deaf Adults Who Use Cochlear Implants: The Role of Talker Variability and Attention to the Mouth. Ear Hear 2024; 45:337-350. [PMID: 37695563 PMCID: PMC10920394 DOI: 10.1097/aud.0000000000001432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
OBJECTIVES Although cochlear implants (CIs) facilitate spoken language acquisition, many CI listeners experience difficulty learning new words. Studies have shown that highly variable stimulus input and audiovisual cues improve speech perception in CI listeners. However, less is known whether these two factors improve perception in a word learning context. Furthermore, few studies have examined how CI listeners direct their gaze to efficiently capture visual information available on a talker's face. The purpose of this study was two-fold: (1) to examine whether talker variability could improve word learning in CI listeners and (2) to examine how CI listeners direct their gaze while viewing a talker speak. DESIGN Eighteen adults with CIs and 10 adults with normal hearing (NH) learned eight novel word-object pairs spoken by a single talker or six different talkers (multiple talkers). The word learning task comprised of nonsense words following the phonotactic rules of English. Learning was probed using a novel talker in a two-alternative forced-choice eye gaze task. Learners' eye movements to the mouth and the target object (accuracy) were tracked over time. RESULTS Both groups performed near ceiling during the test phase, regardless of whether they learned from the same talker or different talkers. However, compared to listeners with NH, CI listeners directed their gaze significantly more to the talker's mouth while learning the words. CONCLUSIONS Unlike NH listeners who can successfully learn words without focusing on the talker's mouth, CI listeners tended to direct their gaze to the talker's mouth, which may facilitate learning. This finding is consistent with the hypothesis that CI listeners use a visual processing strategy that efficiently captures redundant audiovisual speech cues available at the mouth. Due to ceiling effects, however, it is unclear whether talker variability facilitated word learning for adult CI listeners, an issue that should be addressed in future work using more difficult listening conditions.
Collapse
Affiliation(s)
- Jasenia Hartman
- Department of Psychology and Neuroscience, Duke University; Durham, NC 27708
- Neuroscience Training Program, University of Wisconsin-Madison; Madison, WI 53706
| | - Jenny Saffran
- Department of Psychology, University of Wisconsin-Madison; Madison, WI 53706
| | - Ruth Litovsky
- Neuroscience Training Program, University of Wisconsin-Madison; Madison, WI 53706
- Communication and Science Disorders, University of Wisconsin-Madison; Madison, WI 53706
| |
Collapse
|
7
|
Nourski KV, Steinschneider M, Rhone AE, Berger JI, Dappen ER, Kawasaki H, Howard III MA. Intracranial electrophysiology of spectrally degraded speech in the human cortex. Front Hum Neurosci 2024; 17:1334742. [PMID: 38318272 PMCID: PMC10839784 DOI: 10.3389/fnhum.2023.1334742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 12/28/2023] [Indexed: 02/07/2024] Open
Abstract
Introduction Cochlear implants (CIs) are the treatment of choice for severe to profound hearing loss. Variability in CI outcomes remains despite advances in technology and is attributed in part to differences in cortical processing. Studying these differences in CI users is technically challenging. Spectrally degraded stimuli presented to normal-hearing individuals approximate input to the central auditory system in CI users. This study used intracranial electroencephalography (iEEG) to investigate cortical processing of spectrally degraded speech. Methods Participants were adult neurosurgical epilepsy patients. Stimuli were utterances /aba/ and /ada/, spectrally degraded using a noise vocoder (1-4 bands) or presented without vocoding. The stimuli were presented in a two-alternative forced choice task. Cortical activity was recorded using depth and subdural iEEG electrodes. Electrode coverage included auditory core in posteromedial Heschl's gyrus (HGPM), superior temporal gyrus (STG), ventral and dorsal auditory-related areas, and prefrontal and sensorimotor cortex. Analysis focused on high gamma (70-150 Hz) power augmentation and alpha (8-14 Hz) suppression. Results Chance task performance occurred with 1-2 spectral bands and was near-ceiling for clear stimuli. Performance was variable with 3-4 bands, permitting identification of good and poor performers. There was no relationship between task performance and participants demographic, audiometric, neuropsychological, or clinical profiles. Several response patterns were identified based on magnitude and differences between stimulus conditions. HGPM responded strongly to all stimuli. A preference for clear speech emerged within non-core auditory cortex. Good performers typically had strong responses to all stimuli along the dorsal stream, including posterior STG, supramarginal, and precentral gyrus; a minority of sites in STG and supramarginal gyrus had a preference for vocoded stimuli. In poor performers, responses were typically restricted to clear speech. Alpha suppression was more pronounced in good performers. In contrast, poor performers exhibited a greater involvement of posterior middle temporal gyrus when listening to clear speech. Discussion Responses to noise-vocoded speech provide insights into potential factors underlying CI outcome variability. The results emphasize differences in the balance of neural processing along the dorsal and ventral stream between good and poor performers, identify specific cortical regions that may have diagnostic and prognostic utility, and suggest potential targets for neuromodulation-based CI rehabilitation strategies.
Collapse
Affiliation(s)
- Kirill V. Nourski
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
- Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, United States
| | - Mitchell Steinschneider
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, United States
| | - Ariane E. Rhone
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
| | - Joel I. Berger
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
| | - Emily R. Dappen
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
- Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, United States
| | - Hiroto Kawasaki
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
| | - Matthew A. Howard III
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, United States
- Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, United States
- Pappajohn Biomedical Institute, The University of Iowa, Iowa City, IA, United States
| |
Collapse
|
8
|
Tao DD, Shi B, Galvin JJ, Liu JS, Fu QJ. Frequency detection, frequency discrimination, and spectro-temporal pattern perception in older and younger typically hearing adults. Heliyon 2023; 9:e18922. [PMID: 37583764 PMCID: PMC10424075 DOI: 10.1016/j.heliyon.2023.e18922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 07/14/2023] [Accepted: 08/02/2023] [Indexed: 08/17/2023] Open
Abstract
Elderly adults often experience difficulties in speech understanding, possibly due to age-related deficits in frequency perception. It is unclear whether age-related deficits in frequency perception differ between the apical or basal regions of the cochlea. It is also unclear how aging might differently affect frequency discrimination or detection of a change in frequency within a stimulus. In the present study, pure-tone frequency thresholds were measured in 19 older (61-74 years) and 20 younger (22-28 years) typically hearing adults. Participants were asked to discriminate between reference and probe frequencies or to detect changes in frequency within a probe stimulus. Broadband spectro-temporal pattern perception was also measured using the spectro-temporal modulated ripple test (SMRT). Frequency thresholds were significantly poorer in the basal than in the apical region of the cochlea; the deficit in the basal region was 2 times larger for the older than for the younger group. Frequency thresholds were significantly poorer in the older group, especially in the basal region where frequency detection thresholds were 3.9 times poorer for the older than for the younger group. SMRT thresholds were 1.5 times better for the younger than for the older group. Significant age effects were observed for SMRT thresholds and for frequency thresholds only in the basal region. SMRT thresholds were significantly correlated with frequency thresholds only in the older group. The poorer frequency and spectro-temporal pattern perception may contribute to age-related deficits in speech perception, even when audiometric thresholds are nearly normal.
Collapse
Affiliation(s)
- Duo-Duo Tao
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Bin Shi
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - John J. Galvin
- House Institute Foundation, Los Angeles, CA, 90057, USA
- University Hospital Center of Tours, Tours, 37000, France
| | - Ji-Sheng Liu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
9
|
Bochner J, Samar V, Prud'hommeaux E, Huenerfauth M. Phoneme Categorization in Prelingually Deaf Adult Cochlear Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4429-4453. [PMID: 36279201 DOI: 10.1044/2022_jslhr-22-00038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
PURPOSE Phoneme categorization (PC) for voice onset time and second formant transition was studied in adult cochlear implant (CI) users with early-onset deafness and hearing controls. METHOD Identification and discrimination tasks were administered to 30 participants implanted before 4 years of age, 21 participants implanted after 7 years of age, and 21 hearing individuals. RESULTS Distinctive identification and discrimination functions confirmed PC within all groups. Compared to hearing participants, the CI groups generally displayed longer/higher category boundaries, shallower identification function slopes, reduced identification consistency, and reduced discrimination performance. A principal component analysis revealed that identification consistency, discrimination accuracy, and identification function slope, but not boundary location, loaded on a single factor, reflecting general PC performance. Earlier implantation was associated with better PC performance within the early CI group, but not the late CI group. Within the early CI group, earlier implantation age but not PC performance was associated with better speech recognition. Conversely, within the late CI group, better PC performance but not earlier implantation age was associated with better speech recognition. CONCLUSIONS Results suggest that implantation timing within the sensitive period before 4 years of age partly determines the level of PC performance. They also suggest that early implantation may promote development of higher level processes that can compensate for relatively poor PC performance, as can occur in challenging listening conditions.
Collapse
Affiliation(s)
- Joseph Bochner
- National Technical Institute for the Deaf, Rochester Institute of Technology, NY
| | - Vincent Samar
- National Technical Institute for the Deaf, Rochester Institute of Technology, NY
| | | | - Matt Huenerfauth
- Golisano College of Computing and Information Sciences, Rochester Institute of Technology, NY
| |
Collapse
|
10
|
Winn MB, Wright RA. Reconsidering commonly used stimuli in speech perception experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1394. [PMID: 36182291 DOI: 10.1121/10.0013415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 07/18/2022] [Indexed: 06/16/2023]
Abstract
This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Richard A Wright
- Department of Linguistics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
11
|
Mills HE, Shorey AE, Theodore RM, Stilp CE. Context effects in perception of vowels differentiated by F 1 are not influenced by variability in talkers' mean F 1 or F 3. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:55. [PMID: 35931547 DOI: 10.1121/10.0011920] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 06/08/2022] [Indexed: 06/15/2023]
Abstract
Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F1) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443-1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3. Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1, mean F3), but these results may be task-dependent.
Collapse
Affiliation(s)
- Hannah E Mills
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
12
|
Jahn KN, Arenberg JG, Horn DL. Spectral Resolution Development in Children With Normal Hearing and With Cochlear Implants: A Review of Behavioral Studies. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1646-1658. [PMID: 35201848 PMCID: PMC9499384 DOI: 10.1044/2021_jslhr-21-00307] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 09/09/2021] [Accepted: 12/01/2021] [Indexed: 06/14/2023]
Abstract
PURPOSE This review article provides a theoretical overview of the development of spectral resolution in children with normal hearing (cNH) and in those who use cochlear implants (CIs), with an emphasis on methodological considerations. The aim was to identify key directions for future research on spectral resolution development in children with CIs. METHOD A comprehensive literature review was conducted to summarize and synthesize previously published behavioral research on spectral resolution development in normal and impaired auditory systems. CONCLUSIONS In cNH, performance on spectral resolution tasks continues to improve through the teenage years and is likely driven by gradual maturation of across-channel intensity resolution. A small but growing body of evidence from children with CIs suggests a more complex relationship between spectral resolution development, patient demographics, and the quality of the CI electrode-neuron interface. Future research should aim to distinguish between the effects of patient-specific variables and the underlying physiology on spectral resolution abilities in children of all ages who are hard of hearing and use auditory prostheses.
Collapse
Affiliation(s)
- Kelly N. Jahn
- Department of Speech, Language, and Hearing, School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson
- Callier Center for Communication Disorders, The University of Texas at Dallas
| | - Julie G. Arenberg
- Department of Otolaryngology – Head and Neck Surgery, Harvard Medical School, Boston, MA
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston
| | - David L. Horn
- Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology – Head and Neck Surgery, University of Washington, Seattle
- Division of Otolaryngology, Seattle Children's Hospital, WA
| |
Collapse
|
13
|
Shader MJ, Kwon BJ, Gordon-Salant S, Goupell MJ. Open-Set Phoneme Recognition Performance With Varied Temporal Cues in Younger and Older Cochlear Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1196-1211. [PMID: 35133853 PMCID: PMC9150732 DOI: 10.1044/2021_jslhr-21-00299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 09/20/2021] [Accepted: 11/12/2021] [Indexed: 06/14/2023]
Abstract
PURPOSE The goal of this study was to investigate the effect of age on phoneme recognition performance in which the stimuli varied in the amount of temporal information available in the signal. Chronological age is increasingly recognized as a factor that can limit the amount of benefit an individual can receive from a cochlear implant (CI). Central auditory temporal processing deficits in older listeners may contribute to the performance gap between younger and older CI users on recognition of phonemes varying in temporal cues. METHOD Phoneme recognition was measured at three stimulation rates (500, 900, and 1800 pulses per second) and two envelope modulation frequencies (50 Hz and unfiltered) in 20 CI participants ranging in age from 27 to 85 years. Speech stimuli were multiple word pairs differing in temporal contrasts and were presented via direct stimulation of the electrode array using an eight-channel continuous interleaved sampling strategy. Phoneme recognition performance was evaluated at each stimulation rate condition using both envelope modulation frequencies. RESULTS Duration of deafness was the strongest subject-level predictor of phoneme recognition, with participants with longer durations of deafness having poorer performance overall. Chronological age did not predict performance for any stimulus condition. Additionally, duration of deafness interacted with envelope filtering. Participants with shorter durations of deafness were able to take advantage of higher frequency envelope modulations, while participants with longer durations of deafness were not. CONCLUSIONS Age did not significantly predict phoneme recognition performance. In contrast, longer durations of deafness were associated with a reduced ability to utilize available temporal information within the signal to improve phoneme recognition performance.
Collapse
Affiliation(s)
- Maureen J. Shader
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN
| | | | | | - Matthew J. Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| |
Collapse
|
14
|
Xie Z, Anderson S, Goupell MJ. Stimulus context affects the phonemic categorization of temporally based word contrasts in adult cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:2149. [PMID: 35364963 PMCID: PMC8957389 DOI: 10.1121/10.0009838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 02/20/2022] [Accepted: 03/04/2022] [Indexed: 06/14/2023]
Abstract
Cochlear-implant (CI) users rely heavily on temporal envelope cues for speech understanding. This study examined whether their sensitivity to temporal cues in word segments is affected when the words are preceded by non-informative carrier sentences. Thirteen adult CI users performed phonemic categorization tasks that present primarily temporally based word contrasts: Buy-Pie contrast with word-initial stop of varying voice-onset time (VOT), and Dish-Ditch contrast with varying silent intervals preceding the word-final fricative. These words were presented in isolation or were preceded by carrier stimuli including a sentence, a sentence-envelope-modulated noise, or an unmodulated speech-shaped noise. While participants were able to categorize both word contrasts, stimulus context effects were observed primarily for the Buy-Pie contrast, such that participants reported more "Buy" responses for words with longer VOTs in conditions with carrier stimuli than in isolation. The two non-speech carrier stimuli yielded similar or even greater context effects than sentences. The context effects disappeared when target words were delayed from the carrier stimuli for ≥75 ms. These results suggest that stimulus contexts affect auditory temporal processing in CI users but the context effects appear to be cue-specific. The context effects may be governed by general auditory processes, not those specific to speech processing.
Collapse
Affiliation(s)
- Zilong Xie
- Department of Hearing and Speech, University of Kansas Medical Center, 3901 Rainbow Boulevard, Kansas City, Kansas 66160, USA
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, 0100 Samuel J. LeFrak Hall, College Park, Maryland 20742, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, 0100 Samuel J. LeFrak Hall, College Park, Maryland 20742, USA
| |
Collapse
|
15
|
Arjmandi M, Houston D, Wang Y, Dilley L. Estimating the reduced benefit of infant-directed speech in cochlear implant-related speech processing. Neurosci Res 2021; 171:49-61. [PMID: 33484749 PMCID: PMC8289972 DOI: 10.1016/j.neures.2021.01.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/19/2020] [Accepted: 01/17/2021] [Indexed: 11/27/2022]
Abstract
Caregivers modify their speech when talking to infants, a specific type of speech known as infant-directed speech (IDS). This speaking style facilitates language learning compared to adult-directed speech (ADS) in infants with normal hearing (NH). While infants with NH and those with cochlear implants (CIs) prefer listening to IDS over ADS, it is yet unknown how CI processing may affect the acoustic distinctiveness between ADS and IDS, as well as the degree of intelligibility of these. This study analyzed speech of seven female adult talkers to model the effects of simulated CI processing on (1) acoustic distinctiveness between ADS and IDS, (2) estimates of intelligibility of caregivers' speech in ADS and IDS, and (3) individual differences in caregivers' ADS-to-IDS modification and estimated speech intelligibility. Results suggest that CI processing is substantially detrimental to the acoustic distinctiveness between ADS and IDS, as well as to the intelligibility benefit derived from ADS-to-IDS modifications. Moreover, the observed variability across individual talkers in acoustic implementation of ADS-to-IDS modification and the estimated speech intelligibility was significantly reduced due to CI processing. The findings are discussed in the context of the link between IDS and language learning in infants with CIs.
Collapse
Affiliation(s)
- Meisam Arjmandi
- Department of Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, East Lansing, MI 48824, USA.
| | - Derek Houston
- Department of Otolaryngology - Head and Neck Surgery, The Ohio State University, 915 Olentangy River Road, Columbus, OH 43212, USA
| | - Yuanyuan Wang
- Department of Otolaryngology - Head and Neck Surgery, The Ohio State University, 915 Olentangy River Road, Columbus, OH 43212, USA
| | - Laura Dilley
- Department of Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, East Lansing, MI 48824, USA
| |
Collapse
|
16
|
Stilp CE. Parameterizing spectral contrast effects in vowel categorization using noise contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2806. [PMID: 34717452 DOI: 10.1121/10.0006657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/18/2021] [Indexed: 06/13/2023]
Abstract
When spectra differ between earlier (context) and later (target) sounds, listeners perceive larger spectral changes than are physically present. When context sounds (e.g., a sentence) possess relatively higher frequencies, the target sound (e.g., a vowel sound) is perceived as possessing relatively lower frequencies, and vice versa. These spectral contrast effects (SCEs) are pervasive in auditory perception, but studies traditionally employed contexts with high spectrotemporal variability that made it difficult to understand exactly when context spectral properties biased perception. Here, contexts were speech-shaped noise divided into four consecutive 500-ms epochs. Contexts were filtered to amplify low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequencies to encourage target perception of /ɛ/ ("bet") or /ɪ/ ("bit"), respectively, via SCEs. Spectral peaks in the context ranged from its initial epoch(s) to its entire duration (onset paradigm), ranged from its final epoch(s) to its entire duration (offset paradigm), or were present for only one epoch (single paradigm). SCE magnitudes increased as spectral-peak durations increased and/or occurred later in the context (closer to the target). Contrary to predictions, brief early spectral peaks still biased subsequent target categorization. Results are compared to related experiments using speech contexts, and physiological and/or psychoacoustic idiosyncrasies of the noise contexts are considered.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
17
|
Trotter AS, Banks B, Adank P. The Relevance of the Availability of Visual Speech Cues During Adaptation to Noise-Vocoded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2513-2528. [PMID: 34161748 DOI: 10.1044/2021_jslhr-20-00575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose This study first aimed to establish whether viewing specific parts of the speaker's face (eyes or mouth), compared to viewing the whole face, affected adaptation to distorted noise-vocoded sentences. Second, this study also aimed to replicate results on processing of distorted speech from lab-based experiments in an online setup. Method We monitored recognition accuracy online while participants were listening to noise-vocoded sentences. We first established if participants were able to perceive and adapt to audiovisual four-band noise-vocoded sentences when the entire moving face was visible (AV Full). Four further groups were then tested: a group in which participants viewed the moving lower part of the speaker's face (AV Mouth), a group in which participants only see the moving upper part of the face (AV Eyes), a group in which participants could not see the moving lower or upper face (AV Blocked), and a group in which participants saw an image of a still face (AV Still). Results Participants repeated around 40% of the key words correctly and adapted during the experiment, but only when the moving mouth was visible. In contrast, performance was at floor level, and no adaptation took place, in conditions when the moving mouth was occluded. Conclusions The results show the importance of being able to observe relevant visual speech information from the speaker's mouth region, but not the eyes/upper face region, when listening and adapting to distorted sentences online. Second, the results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies replicate. Supplemental Material https://doi.org/10.23641/asha.14810523.
Collapse
Affiliation(s)
- Antony S Trotter
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Briony Banks
- Department of Psychology, Lancaster University, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
18
|
Abstract
OBJECTIVES When auditory and visual speech information are presented together, listeners obtain an audiovisual (AV) benefit or a speech understanding improvement compared with auditory-only (AO) or visual-only (VO) presentations. Cochlear-implant (CI) listeners, who receive degraded speech input and therefore understand speech using primarily temporal information, seem to readily use visual cues and can achieve a larger AV benefit than normal-hearing (NH) listeners. It is unclear, however, if the AV benefit remains relatively large for CI listeners when trying to understand foreign-accented speech when compared with unaccented speech. Accented speech can introduce changes to temporal auditory cues and visual cues, which could decrease the usefulness of AV information. Furthermore, we sought to determine if the AV benefit was relatively larger in CI compared with NH listeners for both unaccented and accented speech. DESIGN AV benefit was investigated for unaccented and Spanish-accented speech by presenting English sentences in AO, VO, and AV conditions to 15 CI and 15 age- and performance-matched NH listeners. Performance matching between NH and CI listeners was achieved by varying the number of channels of a noise vocoder for the NH listeners. Because of the differences in age and hearing history of the CI listeners, the effects of listener-related variables on speech understanding performance and AV benefit were also examined. RESULTS AV benefit was observed for both unaccented and accented conditions and for both CI and NH listeners. The two groups showed similar performance for the AO and AV conditions, and the normalized AV benefit was relatively smaller for the accented than the unaccented conditions. In the CI listeners, older age was associated with significantly poorer performance with the accented speaker compared with the unaccented speaker. The negative impact of age was somewhat reduced by a significant improvement in performance with access to AV information. CONCLUSIONS When auditory speech information is degraded by CI sound processing, visual cues can be used to improve speech understanding, even in the presence of a Spanish accent. The AV benefit of the CI listeners closely matched that of the NH listeners presented with vocoded speech, which was unexpected given that CI listeners appear to rely more on visual information to communicate. This result is perhaps due to the one-to-one age and performance matching of the listeners. While aging decreased CI listener performance with the accented speaker, access to visual cues boosted performance and could partially overcome the age-related speech understanding deficits for the older CI listeners.
Collapse
|
19
|
Feng L, Oxenham AJ. Spectral Contrast Effects Reveal Different Acoustic Cues for Vowel Recognition in Cochlear-Implant Users. Ear Hear 2021; 41:990-997. [PMID: 31815819 PMCID: PMC7874522 DOI: 10.1097/aud.0000000000000820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The identity of a speech sound can be affected by the spectrum of a preceding stimulus in a contrastive manner. Although such aftereffects are often reduced in people with hearing loss and cochlear implants (CIs), one recent study demonstrated larger spectral contrast effects in CI users than in normal-hearing (NH) listeners. The present study aimed to shed light on this puzzling finding. We hypothesized that poorer spectral resolution leads CI users to rely on different acoustic cues not only to identify speech sounds but also to adapt to the context. DESIGN Thirteen postlingually deafened adult CI users and 33 NH participants (listening to either vocoded or unprocessed speech) participated in this study. Psychometric functions were estimated in a vowel categorization task along the /I/ to /ε/ (as in "bit" and "bet") continuum following a context sentence, the long-term average spectrum of which was manipulated at the level of either fine-grained local spectral cues or coarser global spectral cues. RESULTS In NH listeners with unprocessed speech, the aftereffect was determined solely by the fine-grained local spectral cues, resulting in a surprising insensitivity to the larger, global spectral cues utilized by CI users. Restricting the spectral resolution available to NH listeners via vocoding resulted in patterns of responses more similar to those found in CI users. However, the size of the contrast aftereffect remained smaller in NH listeners than in CI users. CONCLUSIONS Only the spectral contrasts used by listeners contributed to the spectral contrast effects in vowel identification. These results explain why CI users can experience larger-than-normal context effects under specific conditions. The results also suggest that adaptation to new spectral cues can be very rapid for vowel discrimination, but may follow a longer time course to influence spectral contrast effects.
Collapse
Affiliation(s)
- Lei Feng
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| | | |
Collapse
|
20
|
Yun D, Jennings TR, Kidd G, Goupell MJ. Benefits of triple acoustic beamforming during speech-on-speech masking and sound localization for bilateral cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3052. [PMID: 34241104 PMCID: PMC8102069 DOI: 10.1121/10.0003933] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 03/03/2021] [Accepted: 03/06/2021] [Indexed: 05/30/2023]
Abstract
Bilateral cochlear-implant (CI) users struggle to understand speech in noisy environments despite receiving some spatial-hearing benefits. One potential solution is to provide acoustic beamforming. A headphone-based experiment was conducted to compare speech understanding under natural CI listening conditions and for two non-adaptive beamformers, one single beam and one binaural, called "triple beam," which provides an improved signal-to-noise ratio (beamforming benefit) and usable spatial cues by reintroducing interaural level differences. Speech reception thresholds (SRTs) for speech-on-speech masking were measured with target speech presented in front and two maskers in co-located or narrow/wide separations. Numerosity judgments and sound-localization performance also were measured. Natural spatial cues, single-beam, and triple-beam conditions were compared. For CI listeners, there was a negligible change in SRTs when comparing co-located to separated maskers for natural listening conditions. In contrast, there were 4.9- and 16.9-dB improvements in SRTs for the beamformer and 3.5- and 12.3-dB improvements for triple beam (narrow and wide separations). Similar results were found for normal-hearing listeners presented with vocoded stimuli. Single beam improved speech-on-speech masking performance but yielded poor sound localization. Triple beam improved speech-on-speech masking performance, albeit less than the single beam, and sound localization. Thus, triple beam was the most versatile across multiple spatial-hearing domains.
Collapse
Affiliation(s)
- David Yun
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Todd R Jennings
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
21
|
Individual Variability in Recalibrating to Spectrally Shifted Speech: Implications for Cochlear Implants. Ear Hear 2021; 42:1412-1427. [PMID: 33795617 DOI: 10.1097/aud.0000000000001043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Cochlear implant (CI) recipients are at a severe disadvantage compared with normal-hearing listeners in distinguishing consonants that differ by place of articulation because the key relevant spectral differences are degraded by the implant. One component of that degradation is the upward shifting of spectral energy that occurs with a shallow insertion depth of a CI. The present study aimed to systematically measure the effects of spectral shifting on word recognition and phoneme categorization by specifically controlling the amount of shifting and using stimuli whose identification specifically depends on perceiving frequency cues. We hypothesized that listeners would be biased toward perceiving phonemes that contain higher-frequency components because of the upward frequency shift and that intelligibility would decrease as spectral shifting increased. DESIGN Normal-hearing listeners (n = 15) heard sine wave-vocoded speech with simulated upward frequency shifts of 0, 2, 4, and 6 mm of cochlear space to simulate shallow CI insertion depth. Stimuli included monosyllabic words and /b/-/d/ and /∫/-/s/ continua that varied systematically by formant frequency transitions or frication noise spectral peaks, respectively. Recalibration to spectral shifting was operationally defined as shifting perceptual acoustic-phonetic mapping commensurate with the spectral shift. In other words, adjusting frequency expectations for both phonemes upward so that there is still a perceptual distinction, rather than hearing all upward-shifted phonemes as the higher-frequency member of the pair. RESULTS For moderate amounts of spectral shifting, group data suggested a general "halfway" recalibration to spectral shifting, but individual data suggested a notably different conclusion: half of the listeners were able to recalibrate fully, while the other halves of the listeners were utterly unable to categorize shifted speech with any reliability. There were no participants who demonstrated a pattern intermediate to these two extremes. Intelligibility of words decreased with greater amounts of spectral shifting, also showing loose clusters of better- and poorer-performing listeners. Phonetic analysis of word errors revealed certain cues were more susceptible to being compromised due to a frequency shift (place and manner of articulation), while voicing was robust to spectral shifting. CONCLUSIONS Shifting the frequency spectrum of speech has systematic effects that are in line with known properties of speech acoustics, but the ensuing difficulties cannot be predicted based on tonotopic mismatch alone. Difficulties are subject to substantial individual differences in the capacity to adjust acoustic-phonetic mapping. These results help to explain why speech recognition in CI listeners cannot be fully predicted by peripheral factors like electrode placement and spectral resolution; even among listeners with functionally equivalent auditory input, there is an additional factor of simply being able or unable to flexibly adjust acoustic-phonetic mapping. This individual variability could motivate precise treatment approaches guided by an individual's relative reliance on wideband frequency representation (even if it is mismatched) or limited frequency coverage whose tonotopy is preserved.
Collapse
|
22
|
Winn MB, Moore AN. Perceptual weighting of acoustic cues for accommodating gender-related talker differences heard by listeners with normal hearing and with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:496. [PMID: 32873011 PMCID: PMC7402726 DOI: 10.1121/10.0001672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/31/2020] [Accepted: 07/14/2020] [Indexed: 06/11/2023]
Abstract
Listeners must accommodate acoustic differences between vocal tracts and speaking styles of conversation partners-a process called normalization or accommodation. This study explores what acoustic cues are used to make this perceptual adjustment by listeners with normal hearing or with cochlear implants, when the acoustic variability is related to the talker's gender. A continuum between /ʃ/ and /s/ was paired with naturally spoken vocalic contexts that were parametrically manipulated to vary by numerous cues for talker gender including fundamental frequency (F0), vocal tract length (formant spacing), and direct spectral contrast with the fricative. The goal was to examine relative contributions of these cues toward the tendency to have a lower-frequency acoustic boundary for fricatives spoken by men (found in numerous previous studies). Normal hearing listeners relied primarily on formant spacing and much less on F0. The CI listeners were individually variable, with the F0 cue emerging as the strongest cue on average.
Collapse
Affiliation(s)
- Matthew B Winn
- Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Ashley N Moore
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington 98105, USA
| |
Collapse
|
23
|
DiNino M, Arenberg JG, Duchen ALR, Winn MB. Effects of Age and Cochlear Implantation on Spectrally Cued Speech Categorization. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:2425-2440. [PMID: 32552327 PMCID: PMC7838840 DOI: 10.1044/2020_jslhr-19-00127] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 08/12/2019] [Accepted: 03/30/2020] [Indexed: 06/11/2023]
Abstract
Purpose Weighting of acoustic cues for perceiving place-of-articulation speech contrasts was measured to determine the separate and interactive effects of age and use of cochlear implants (CIs). It has been found that adults with normal hearing (NH) show reliance on fine-grained spectral information (e.g., formants), whereas adults with CIs show reliance on broad spectral shape (e.g., spectral tilt). In question was whether children with NH and CIs would demonstrate the same patterns as adults, or show differences based on ongoing maturation of hearing and phonetic skills. Method Children and adults with NH and with CIs categorized a /b/-/d/ speech contrast based on two orthogonal spectral cues. Among CI users, phonetic cue weights were compared to vowel identification scores and Spectral-Temporally Modulated Ripple Test thresholds. Results NH children and adults both relied relatively more on the fine-grained formant cue and less on the broad spectral tilt cue compared to participants with CIs. However, early-implanted children with CIs better utilized the formant cue compared to adult CI users. Formant cue weights correlated with CI participants' vowel recognition and in children, also related to Spectral-Temporally Modulated Ripple Test thresholds. Adults and child CI users with very poor phonetic perception showed additive use of the two cues, whereas those with better and/or more mature cue usage showed a prioritized trading relationship, akin to NH listeners. Conclusions Age group and hearing modality can influence phonetic cue-weighting patterns. Results suggest that simple nonlexical categorization tests correlate with more general speech recognition skills of children and adults with CIs.
Collapse
Affiliation(s)
- Mishaela DiNino
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA
| | - Julie G. Arenberg
- Massachusetts Eye and Ear, Harvard Medical School Department of Otolaryngology, Boston
| | | | - Matthew B. Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis
| |
Collapse
|
24
|
Chiu F, Rakusen LL, Mattys SL. Phonetic categorization and discrimination of voice onset time under divided attention. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL484. [PMID: 32611187 DOI: 10.1121/10.0001374] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Accepted: 05/19/2020] [Indexed: 06/11/2023]
Abstract
Event durations are perceived to be shorter under divided attention. "Time shrinkage" is thought to be due to rapid attentional switches between tasks, leading to a loss of input samples, and hence, an under-estimation of duration. However, few studies have considered whether this phenomenon applies to durations relevant to time-based phonetic categorization. In this study, participants categorized auditory stimuli varying in voice onset time (VOT) as /ɡ/ or /k/. They did so under focused attention (auditory task alone) or while performing a low-level visual task at the same time (divided attention). Under divided attention, there was increased response imprecision but no bias toward hearing /ɡ/, the shorter-VOT sound. It is concluded that sample loss under divided attention does not apply to the perception of phonetic contrasts within the VOT range.
Collapse
Affiliation(s)
- Faith Chiu
- Department of Language and Linguistics, University of Essex, Colchester, United Kingdom
| | | | - Sven L Mattys
- Department of Psychology, University of York, York, United , ,
| |
Collapse
|
25
|
Bosker HR, Sjerps MJ, Reinisch E. Spectral contrast effects are modulated by selective attention in "cocktail party" settings. Atten Percept Psychophys 2020; 82:1318-1332. [PMID: 31338824 PMCID: PMC7303055 DOI: 10.3758/s13414-019-01824-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /bɪt/ (with low F1) and /bɛt/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Institute of General Linguistics, Ludwig Maximilian University Munich, Munich, Germany
| |
Collapse
|
26
|
Hendrickson K, Spinelli J, Walker E. Cognitive processes underlying spoken word recognition during soft speech. Cognition 2020; 198:104196. [PMID: 32004934 DOI: 10.1016/j.cognition.2020.104196] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 01/06/2020] [Accepted: 01/18/2020] [Indexed: 11/25/2022]
Abstract
In two eye-tracking experiments using the Visual World Paradigm, we examined how listeners recognize words when faced with speech at lower intensities (40, 50, and 65 dBA). After hearing the target word, participants (n = 32) clicked the corresponding picture from a display of four images - a target (e.g., money), a cohort competitor (e.g., mother), a rhyme competitor (e.g., honey) and an unrelated item (e.g., whistle) - while their eye-movements were tracked. For slightly soft speech (50 dBA), listeners demonstrated an increase in cohort activation, whereas for rhyme competitors, activation started later and was sustained longer in processing. For very soft speech (40 dBA), listeners waited until later in processing to activate potential words, as illustrated by a decrease in activation for cohorts, and an increase in activation for rhymes. Further, the extent to which words were considered depended on word length (mono- vs. bi-syllabic words), and speech-extrinsic factors such as the surrounding listening environment. These results advance current theories of spoken word recognition by considering a range of speech levels more typical of everyday listening environments. From an applied perspective, these results motivate models of how individuals who are hard of hearing approach the task of recognizing spoken words.
Collapse
Affiliation(s)
- Kristi Hendrickson
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Drive, 52242 Iowa City, IA, United States of America; Department of Psychological & Brain Sciences, University of Iowa, 250 Hawkins Drive, 52242 Iowa City, IA, United States of America.
| | - Jessica Spinelli
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Drive, 52242 Iowa City, IA, United States of America.
| | - Elizabeth Walker
- Department of Communication Sciences & Disorders, University of Iowa, 250 Hawkins Drive, 52242 Iowa City, IA, United States of America.
| |
Collapse
|
27
|
Winn MB. Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:852. [PMID: 32113256 DOI: 10.1121/10.0000692] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/22/2020] [Indexed: 06/10/2023]
Abstract
Voice onset time (VOT) is an acoustic property of stop consonants that is commonly manipulated in studies of phonetic perception. This paper contains a thorough description of the "progressive cutback and replacement" method of VOT manipulation, and comparison with other VOT manipulation techniques. Other acoustic properties that covary with VOT-such as fundamental frequency and formant transitions-are also discussed, along with considerations for testing VOT perception and its relationship to various other measures of auditory temporal or spectral processing. An implementation of the progressive cutback and replacement method in the Praat scripting language is presented, which is suitable for modifying natural speech for perceptual experiments involving VOT and/or related covarying F0 and intensity cues. Justifications are provided for the stimulus design choices and constraints implemented in the script.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
28
|
Winn MB. Accommodation of gender-related phonetic differences by listeners with cochlear implants and in a variety of vocoder simulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:174. [PMID: 32006986 PMCID: PMC7341679 DOI: 10.1121/10.0000566] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 12/06/2019] [Accepted: 12/13/2019] [Indexed: 06/01/2023]
Abstract
Speech perception requires accommodation of a wide range of acoustic variability across talkers. A classic example is the perception of "sh" and "s" fricative sounds, which are categorized according to spectral details of the consonant itself, and also by the context of the voice producing it. Because women's and men's voices occupy different frequency ranges, a listener is required to make a corresponding adjustment of acoustic-phonetic category space for these phonemes when hearing different talkers. This pattern is commonplace in everyday speech communication, and yet might not be captured in accuracy scores for whole words, especially when word lists are spoken by a single talker. Phonetic accommodation for fricatives "s" and "sh" was measured in 20 cochlear implant (CI) users and in a variety of vocoder simulations, including those with noise carriers with and without peak picking, simulated spread of excitation, and pulsatile carriers. CI listeners showed strong phonetic accommodation as a group. Each vocoder produced phonetic accommodation except the 8-channel noise vocoder, despite its historically good match with CI users in word intelligibility. Phonetic accommodation is largely independent of linguistic factors and thus might offer information complementary to speech intelligibility tests which are partially affected by language processing.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech & Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
29
|
Oxenham AJ. Spectral contrast effects and auditory enhancement under normal and impaired hearing. ACOUSTICAL SCIENCE AND TECHNOLOGY 2020; 41:108-112. [PMID: 32362758 PMCID: PMC7194197 DOI: 10.1250/ast.41.108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We are generally able to identify sounds and understand speech with ease, despite the large variations in the acoustics of each sound, which occur due to factors such as different talkers, background noise, and room acoustics. This form of perceptual constancy is likely to be mediated in part by the auditory system's ability to adapt to the ongoing environment or context in which sounds are presented. Auditory context effects have been studied under different names, such as spectral contrast effects in speech and auditory enhancement effects in psychoacoustics, but they share some important properties and may be mediated by similar underlying neural mechanisms. This review provides a survey of recent studies from our laboratory that investigate the mechanisms of speech spectral contrast effects and auditory enhancement in people with normal hearing, hearing loss, and cochlear implants. We argue that a better understanding of such context effects in people with normal hearing may allow us to restore some of these important effects for people with hearing loss via signal processing in hearing aids and cochlear implants, thereby potentially improving auditory and speech perception in the complex and variable everyday acoustic backgrounds that surround us.
Collapse
Affiliation(s)
- Andrew J. Oxenham
- Department of Psychology, University of Minnesota – Twin Cities, Elliott Hall N218, 75 East River Road, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
30
|
Auditory Selectivity for Spectral Contrast in Cortical Neurons and Behavior. J Neurosci 2019; 40:1015-1027. [PMID: 31826944 DOI: 10.1523/jneurosci.1200-19.2019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 12/04/2019] [Accepted: 12/06/2019] [Indexed: 12/17/2022] Open
Abstract
Vocal communication relies on the ability of listeners to identify, process, and respond to vocal sounds produced by others in complex environments. To accurately recognize these signals, animals' auditory systems must robustly represent acoustic features that distinguish vocal sounds from other environmental sounds. Vocalizations typically have spectral structure; power regularly fluctuates along the frequency axis, creating spectral contrast. Spectral contrast is closely related to harmonicity, which refers to spectral power peaks occurring at integer multiples of a fundamental frequency. Although both spectral contrast and harmonicity typify natural sounds, they may differ in salience for communication behavior and engage distinct neural mechanisms. Therefore, it is important to understand which of these properties of vocal sounds underlie the neural processing and perception of vocalizations.Here, we test the importance of vocalization-typical spectral features in behavioral recognition and neural processing of vocal sounds, using male zebra finches. We show that behavioral responses to natural and synthesized vocalizations rely on the presence of discrete frequency components, but not on harmonic ratios between frequencies. We identify a specific population of neurons in primary auditory cortex that are sensitive to the spectral resolution of vocal sounds. We find that behavioral and neural response selectivity is explained by sensitivity to spectral contrast rather than harmonicity. This selectivity emerges within the cortex; it is absent in the thalamorecipient region and present in the deep output region. Further, deep-region neurons that are contrast-sensitive show distinct temporal responses and selectivity for modulation density compared with unselective neurons.SIGNIFICANCE STATEMENT Auditory coding and perception are critical for vocal communication. Auditory neurons must encode acoustic features that distinguish vocalizations from other sounds in the environment and generate percepts that direct behavior. The acoustic features that drive neural and behavioral selectivity for vocal sounds are unknown, however. Here, we show that vocal response behavior scales with stimulus spectral contrast but not with harmonicity, in songbirds. We identify a distinct population of auditory cortex neurons in which response selectivity parallels behavioral selectivity. This neural response selectivity is explained by sensitivity to spectral contrast rather than to harmonicity. Our findings inform the understanding of how the auditory system encodes socially-relevant signals via detection of an acoustic feature that is ubiquitous in vocalizations.
Collapse
|
31
|
Gianakas SP, Winn MB. Lexical bias in word recognition by cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3373. [PMID: 31795696 PMCID: PMC6948217 DOI: 10.1121/1.5132938] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 10/04/2019] [Accepted: 10/14/2019] [Indexed: 06/03/2023]
Abstract
When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.
Collapse
Affiliation(s)
- Steven P Gianakas
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| | - Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
32
|
Stilp CE. Auditory enhancement and spectral contrast effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1503. [PMID: 31472539 DOI: 10.1121/1.5120181] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/11/2019] [Indexed: 06/10/2023]
Abstract
The auditory system is remarkably sensitive to changes in the acoustic environment. This is exemplified by two classic effects of preceding spectral context on perception. In auditory enhancement effects (EEs), the absence and subsequent insertion of a frequency component increases its salience. In spectral contrast effects (SCEs), spectral differences between earlier and later (target) sounds are perceptually magnified, biasing target sound categorization. These effects have been suggested to be related, but have largely been studied separately. Here, EEs and SCEs are demonstrated using the same speech materials. In Experiment 1, listeners categorized vowels (/ɪ/-/ɛ/) or consonants (/d/-/g/) following a sentence processed by a bandpass or bandstop filter (vowel tasks: 100-400 or 550-850 Hz; consonant tasks: 1700-2700 or 2700-3700 Hz). Bandpass filtering produced SCEs and bandstop filtering produced EEs, with effect magnitudes significantly correlated at the individual differences level. In Experiment 2, context sentences were processed by variable-depth notch filters in these frequency regions (-5 to -20 dB). EE magnitudes increased at larger notch depths, growing linearly in consonant categorization. This parallels previous research where SCEs increased linearly for larger spectral peaks in the context sentence. These results link EEs and SCEs, as both shape speech categorization in orderly ways.
Collapse
Affiliation(s)
- Christian E Stilp
- 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
33
|
Perceptual sensitivity to spectral properties of earlier sounds during speech categorization. Atten Percept Psychophys 2019; 80:1300-1310. [PMID: 29492759 DOI: 10.3758/s13414-018-1488-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F1 frequency regions, listeners report more high-F1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. SIGNIFICANCE STATEMENT Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur frequently in everyday speech perception.
Collapse
|
34
|
|
35
|
Assgari AA, Theodore RM, Stilp CE. Variability in talkers' fundamental frequencies shapes context effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1443. [PMID: 31067942 DOI: 10.1121/1.5093638] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 02/22/2019] [Indexed: 06/09/2023]
Abstract
The perception of any given sound is influenced by surrounding sounds. When successive sounds differ in their spectral compositions, these differences may be perceptually magnified, resulting in spectral contrast effects (SCEs). For example, listeners are more likely to perceive /ɪ/ (low F1) following sentences with higher F1 frequencies; listeners are also more likely to perceive /ɛ/ (high F1) following sentences with lower F1 frequencies. Previous research showed that SCEs for vowel categorization were attenuated when sentence contexts were spoken by different talkers [Assgari and Stilp. (2015). J. Acoust. Soc. Am. 138(5), 3023-3032], but the locus of this diminished contextual influence was not specified. Here, three experiments examined implications of variable talker acoustics for SCEs in the categorization of /ɪ/ and /ɛ/. The results showed that SCEs were smaller when the mean fundamental frequency (f0) of context sentences was highly variable across talkers compared to when mean f0 was more consistent, even when talker gender was held constant. In contrast, SCE magnitudes were not influenced by variability in mean F1. These findings suggest that talker variability attenuates SCEs due to diminished consistency of f0 as a contextual influence. Connections between these results and talker normalization are considered.
Collapse
Affiliation(s)
- Ashley A Assgari
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06828, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
36
|
O'Brien GE, McCloy DR, Kubota EC, Yeatman JD. Reading ability and phoneme categorization. Sci Rep 2018; 8:16842. [PMID: 30442952 PMCID: PMC6237901 DOI: 10.1038/s41598-018-34823-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/18/2018] [Indexed: 11/10/2022] Open
Abstract
Dyslexia is associated with abnormal performance on many auditory psychophysics tasks, particularly those involving the categorization of speech sounds. However, it is debated whether those apparent auditory deficits arise from (a) reduced sensitivity to particular acoustic cues, (b) the difficulty of experimental tasks, or (c) unmodeled lapses of attention. Here we investigate the relationship between phoneme categorization and reading ability, with special attention to the nature of the cue encoding the phoneme contrast (static versus dynamic), differences in task paradigm difficulty, and methodological details of psychometric model fitting. We find a robust relationship between reading ability and categorization performance, show that task difficulty cannot fully explain that relationship, and provide evidence that the deficit is not restricted to dynamic cue contrasts, contrary to prior reports. Finally, we demonstrate that improved modeling of behavioral responses suggests that performance does differ between children with dyslexia and typical readers, but that the difference may be smaller than previously reported.
Collapse
Affiliation(s)
- Gabrielle E O'Brien
- Institute for Learning and Brain Sciences and Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA.
| | - Daniel R McCloy
- Institute for Learning and Brain Sciences and Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Emily C Kubota
- Institute for Learning and Brain Sciences and Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Jason D Yeatman
- Institute for Learning and Brain Sciences and Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
37
|
Abstract
Cochlear implants restore hearing in deaf individuals, but speech perception remains challenging. Poor discrimination of spectral components is thought to account for limitations of speech recognition in cochlear implant users. We investigated how combined variations of spectral components along two orthogonal dimensions can maximize neural discrimination between two vowels, as measured by mismatch negativity. Adult cochlear implant users and matched normal-hearing listeners underwent electroencephalographic event-related potentials recordings in an optimum-1 oddball paradigm. A standard /a/ vowel was delivered in an acoustic free field along with stimuli having a deviant fundamental frequency (+3 and +6 semitones), a deviant first formant making it a /i/ vowel or combined deviant fundamental frequency and first formant (+3 and +6 semitones /i/ vowels). Speech recognition was assessed with a word repetition task. An analysis of variance between both amplitude and latency of mismatch negativity elicited by each deviant vowel was performed. The strength of correlations between these parameters of mismatch negativity and speech recognition as well as participants' age was assessed. Amplitude of mismatch negativity was weaker in cochlear implant users but was maximized by variations of vowels' first formant. Latency of mismatch negativity was later in cochlear implant users and was particularly extended by variations of the fundamental frequency. Speech recognition correlated with parameters of mismatch negativity elicited by the specific variation of the first formant. This nonlinear effect of acoustic parameters on neural discrimination of vowels has implications for implant processor programming and aural rehabilitation.
Collapse
Affiliation(s)
- François Prévost
- 1 Department of Speech Pathology and Audiology, McGill University Health Centre, Montreal, Quebec, Canada.,2 International Laboratory for Brain, Music & Sound Research, Montreal, Quebec, Canada
| | - Alexandre Lehmann
- 2 International Laboratory for Brain, Music & Sound Research, Montreal, Quebec, Canada.,3 Department of Otolaryngology-Head and Neck Surgery, McGill University, Montreal, Quebec, Canada.,4 Centre for Research on Brain, Language & Music, Montreal, Quebec, Canada
| |
Collapse
|
38
|
Feng L, Oxenham AJ. Spectral contrast effects produced by competing speech contexts. J Exp Psychol Hum Percept Perform 2018; 44:1447-1457. [PMID: 29847973 PMCID: PMC6110988 DOI: 10.1037/xhp0000546] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The long-term spectrum of a preceding sentence can alter the perception of a following speech sound in a contrastive manner. This speech context effect contributes to our ability to extract reliable spectral characteristics of the surrounding acoustic environment and to compensate for the voice characteristics of different speakers or spectral colorations in different listening environments to maintain perceptual constancy. The extent to which such effects are mediated by low-level "automatic" processes, or require directed attention, remains unknown. This study investigated spectral context effects by measuring the effects of two competing sentences on the phoneme category boundary between /i/ and /ε/ in a following target word, while directing listeners' attention to one or the other context sentence. Spatial separation of the context sentences was achieved either by presenting them to different ears, or by presenting them to both ears but imposing an interaural time difference (ITD) between the ears. The results confirmed large context effects based on ear of presentation. Smaller effects were observed based on either ITD or attention. The results, combined with predictions from a two-stage model, suggest that ear-specific factors dominate speech context effects but that the effects can be modulated by higher-level features, such as perceived location, and by attention. (PsycINFO Database Record
Collapse
Affiliation(s)
- Lei Feng
- Department of Psychology, University of Minnesota
| | | |
Collapse
|
39
|
Souza P, Wright R, Gallun F, Reinhart P. Reliability and Repeatability of the Speech Cue Profile. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:2126-2137. [PMID: 30073277 PMCID: PMC6198918 DOI: 10.1044/2018_jslhr-h-17-0341] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Revised: 01/13/2018] [Accepted: 04/08/2018] [Indexed: 05/26/2023]
Abstract
PURPOSE Researchers have long noted speech recognition variability that is not explained by the pure-tone audiogram. Previous work (Souza, Wright, Blackburn, Tatman, & Gallun, 2015) demonstrated that a small number of listeners with sensorineural hearing loss utilized different types of acoustic cues to identify speechlike stimuli, specifically the extent to which the participant relied upon spectral (or temporal) information for identification. Consistent with recent calls for data rigor and reproducibility, the primary aims of this study were to replicate the pattern of cue use in a larger cohort and to verify stability of the cue profiles over time. METHOD Cue-use profiles were measured for adults with sensorineural hearing loss using a syllable identification task consisting of synthetic speechlike stimuli in which spectral and temporal dimensions were manipulated along continua. For the first set, a static spectral shape varied from alveolar to palatal, and a temporal envelope rise time varied from affricate to fricative. For the second set, formant transitions varied from labial to alveolar and a temporal envelope rise time varied from approximant to stop. A discriminant feature analysis was used to determine to what degree spectral and temporal information contributed to stimulus identification. A subset of participants completed a 2nd visit using the same stimuli and procedures. RESULTS When spectral information was static, most participants were more influenced by spectral than by temporal information. When spectral information was dynamic, participants demonstrated a balanced distribution of cue-use patterns, with nearly equal numbers of individuals influenced by spectral or temporal cues. Individual cue profile was repeatable over a period of several months. CONCLUSION In combination with previously published data, these results indicate that listeners with sensorineural hearing loss are influenced by different cues to identify speechlike sounds and that those patterns are stable over time.
Collapse
Affiliation(s)
- Pamela Souza
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL
- Knowles Hearing Center, Northwestern University, Evanston, IL
| | - Richard Wright
- Department of Linguistics, University of Washington, Seattle
| | - Frederick Gallun
- National Center for Rehabilitative Auditory Research, Portland VA Medical Center, Oregon
- Otolaryngology–Head and Neck Surgery, Oregon Health and Science University, Portland
| | - Paul Reinhart
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL
| |
Collapse
|
40
|
Comparison of the Spectral-Temporally Modulated Ripple Test With the Arizona Biomedical Institute Sentence Test in Cochlear Implant Users. Ear Hear 2018; 38:760-766. [PMID: 28957975 DOI: 10.1097/aud.0000000000000496] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Although speech perception is the gold standard for measuring cochlear implant (CI) users' performance, speech perception tests often require extensive adaptation to obtain accurate results, particularly after large changes in maps. Spectral ripple tests, which measure spectral resolution, are an alternate measure that has been shown to correlate with speech perception. A modified spectral ripple test, the spectral-temporally modulated ripple test (SMRT) has recently been developed, and the objective of this study was to compare speech perception and performance on the SMRT for a heterogeneous population of unilateral CI users, bilateral CI users, and bimodal users. DESIGN Twenty-five CI users (eight using unilateral CIs, nine using bilateral CIs, and eight using a CI and a hearing aid) were tested on the Arizona Biomedical Institute Sentence Test (AzBio) with a +8 dB signal to noise ratio, and on the SMRT. All participants were tested with their clinical programs. RESULTS There was a significant correlation between SMRT and AzBio performance. After a practice block, an improvement of one ripple per octave for SMRT corresponded to an improvement of 12.1% for AzBio. Additionally, there was no significant difference in slope or intercept between any of the CI populations. CONCLUSION The results indicate that performance on the SMRT correlates with speech recognition in noise when measured across unilateral, bilateral, and bimodal CI populations. These results suggest that SMRT scores are strongly associated with speech recognition in noise ability in experienced CI users. Further studies should focus on increasing both the size and diversity of the tested participants, and on determining whether the SMRT technique can be used for early predictions of long-term speech scores, or for evaluating differences among different stimulation strategies or parameter settings.
Collapse
|
41
|
Feng L, Oxenham AJ. Effects of spectral resolution on spectral contrast effects in cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL468. [PMID: 29960500 PMCID: PMC6002271 DOI: 10.1121/1.5042082] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 05/02/2018] [Accepted: 05/27/2018] [Indexed: 06/08/2023]
Abstract
The identity of a speech sound can be affected by the long-term spectrum of a preceding stimulus. Poor spectral resolution of cochlear implants (CIs) may affect such context effects. Here, spectral contrast effects on a phoneme category boundary were investigated in CI users and normal-hearing (NH) listeners. Surprisingly, larger contrast effects were observed in CI users than in NH listeners, even when spectral resolution in NH listeners was limited via vocoder processing. The results may reflect a different weighting of spectral cues by CI users, based on poorer spectral resolution, which in turn may enhance some spectral contrast effects.
Collapse
Affiliation(s)
- Lei Feng
- Department of Psychology, University of Minnesota, N218 Elliott Hall, 75 East River Parkway, Minneapolis, Minnesota 55455, USA ,
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, N218 Elliott Hall, 75 East River Parkway, Minneapolis, Minnesota 55455, USA ,
| |
Collapse
|
42
|
McKay CM, Rickard N, Henshall K. Intensity Discrimination and Speech Recognition of Cochlear Implant Users. J Assoc Res Otolaryngol 2018; 19:589-600. [PMID: 29777327 DOI: 10.1007/s10162-018-0675-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 05/07/2018] [Indexed: 12/23/2022] Open
Abstract
The relation between speech recognition and within-channel or across-channel (i.e., spectral tilt) intensity discrimination was measured in nine CI users (11 ears). Within-channel intensity difference limens (IDLs) were measured at four electrode locations across the electrode array. Spectral tilt difference limens were measured with (XIDL-J) and without (XIDL) level jitter. Only three subjects could perform the XIDL-J task with the amount of jitter required to limit use of within-channel cues. XIDLs (normalized to %DR) were correlated with speech recognition (r = 0.67, P = 0.019) and were highly correlated with IDLs. XIDLs were on average nearly 3 times larger than IDLs and did not vary consistently with the spatial separation of the two component electrodes. The overall pattern of results was consistent with a common underlying subject-dependent limitation in the two difference limen tasks, hypothesized to be perceptual variance (how the perception of a sound differs on different presentations), which may also underlie the correlation of XIDLs with speech recognition. Evidence that spectral tilt discrimination is more important for speech recognition than within-channel intensity discrimination was not unequivocally shown in this study. However, the results tended to support this proposition, with XIDLs more correlated with speech performance than IDLs, and the ratio XIDL/IDL also being correlated with speech recognition. If supported by further research, the importance of perceptual variance as a limiting factor in speech understanding for CI users has important implications for efforts to improve outcomes for those with poor speech recognition.
Collapse
Affiliation(s)
- Colette M McKay
- Bionics Institute, 384-388 Albert St, East Melbourne, 3002, Australia. .,Department of Medical Bionics, The University of Melbourne, Melbourne, Australia.
| | - Natalie Rickard
- Bionics Institute, 384-388 Albert St, East Melbourne, 3002, Australia
| | | |
Collapse
|
43
|
Hawthorne K. Prosody-driven syntax learning is robust to impoverished pitch and spectral cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2756. [PMID: 29857717 DOI: 10.1121/1.5031130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Across languages, prosodic boundaries tend to align with syntactic boundaries, and both infant and adult language learners capitalize on these correlations to jump-start syntax acquisition. However, it is unclear which prosodic cues-pauses, final-syllable lengthening, and/or pitch resets across boundaries-are necessary for prosodic bootstrapping to occur. It is also unknown how syntax acquisition is impacted when listeners do not have access to the full range of prosodic or spectral information. These questions were addressed using 14-channel noise-vocoded (spectrally degraded) speech. While pre-boundary lengthening and pauses are well-transmitted through noise-vocoded speech, pitch is not; overall intelligibility is also decreased. In two artificial grammar experiments, adult native English speakers showed a similar ability to use English-like prosody to bootstrap unfamiliar syntactic structures from degraded speech and natural, unmanipulated speech. Contrary to previous findings that listeners may require pitch resets and final lengthening to co-occur if no pause cue is present, participants in the degraded speech conditions were able to detect prosodic boundaries from lengthening alone. Results suggest that pitch is not necessary for adult English speakers to perceive prosodic boundaries associated with syntactic structures, and that prosodic bootstrapping is robust to degraded spectral information.
Collapse
Affiliation(s)
- Kara Hawthorne
- Department of Communication Sciences and Disorders, University of Mississippi, 304 George Hall, University, Mississippi 38677, USA
| |
Collapse
|
44
|
Kapolowicz MR, Montazeri V, Assmann PF. Perceiving foreign-accented speech with decreased spectral resolution in single- and multiple-talker conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL99. [PMID: 29495755 DOI: 10.1121/1.5023594] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To determine the effect of reduced spectral resolution on the intelligibility of foreign-accented speech, vocoder-processed sentences from native and Mandarin-accented English talkers were presented to listeners in single- and multiple-talker conditions. Reduced spectral resolution had little effect on native speech but lowered performance for foreign-accented speech, with a further decrease in multiple-talker conditions. Following the initial exposure, foreign-accented speech with reduced spectral resolution was less intelligible than unprocessed speech in both single- and multiple-talker conditions. Intelligibility improved with extended exposure, but only for single-talker conditions. Results indicate a perceptual impairment when perceiving foreign-accented speech with reduced spectral resolution.
Collapse
Affiliation(s)
- Michelle R Kapolowicz
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080-3021, USA , ,
| | - Vahid Montazeri
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080-3021, USA , ,
| | - Peter F Assmann
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080-3021, USA , ,
| |
Collapse
|
45
|
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization. Ear Hear 2018; 37:e377-e390. [PMID: 27438871 DOI: 10.1097/aud.0000000000000328] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. DESIGN Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. RESULTS Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. CONCLUSIONS When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Collapse
|
46
|
Kapnoula EC, Winn MB, Kong EJ, Edwards J, McMurray B. Evaluating the sources and functions of gradiency in phoneme categorization: An individual differences approach. J Exp Psychol Hum Percept Perform 2017; 43:1594-1611. [PMID: 28406683 PMCID: PMC5561468 DOI: 10.1037/xhp0000410] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
During spoken language comprehension listeners transform continuous acoustic cues into categories (e.g., /b/ and /p/). While long-standing research suggests that phonetic categories are activated in a gradient way, there are also clear individual differences in that more gradient categorization has been linked to various communication impairments such as dyslexia and specific language impairments (Joanisse, Manis, Keating, & Seidenberg, 2000; López-Zamora, Luque, Álvarez, & Cobos, 2012; Serniclaes, Van Heghe, Mousty, Carré, & Sprenger-Charolles, 2004; Werker & Tees, 1987). Crucially, most studies have used 2-alternative forced choice (2AFC) tasks to measure the sharpness of between-category boundaries. Here we propose an alternative paradigm that allows us to measure categorization gradiency in a more direct way. Furthermore, we follow an individual differences approach to (a) link this measure of gradiency to multiple cue integration, (b) explore its relationship to a set of other cognitive processes, and (c) evaluate its role in individuals' ability to perceive speech in noise. Our results provide validation for this new method of assessing phoneme categorization gradiency and offer preliminary insights into how different aspects of speech perception may be linked to each other and to more general cognitive processes. (PsycINFO Database Record
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Department of Psychological and Brain Sciences, DeLTA Center, University of Iowa
| | - Matthew B Winn
- Department of Speech and Hearing Sciences, University of Washington
| | | | - Jan Edwards
- Department of Communication Sciences and Disorders, Waisman Center, University of Wisconsin-Madison
| | - Bob McMurray
- Department of Psychological and Brain Sciences, DeLTA Center, University of Iowa
| |
Collapse
|
47
|
He S, Teagle HFB, Buchman CA. The Electrically Evoked Compound Action Potential: From Laboratory to Clinic. Front Neurosci 2017; 11:339. [PMID: 28690494 PMCID: PMC5481377 DOI: 10.3389/fnins.2017.00339] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 05/30/2017] [Indexed: 11/13/2022] Open
Abstract
The electrically evoked compound action potential (eCAP) represents the synchronous firing of a population of electrically stimulated auditory nerve fibers. It can be directly recorded on a surgically exposed nerve trunk in animals or from an intra-cochlear electrode of a cochlear implant. In the past two decades, the eCAP has been widely recorded in both animals and clinical patient populations using different testing paradigms. This paper provides an overview of recording methodologies and response characteristics of the eCAP, as well as its potential applications in research and clinical situations. Relevant studies are reviewed and implications for clinicians are discussed.
Collapse
Affiliation(s)
- Shuman He
- Center for Hearing Research, Boys Town National Research HospitalOmaha, NE, United States
| | - Holly F. B. Teagle
- Department of Otolaryngology—Head and Neck Surgery, University of North Carolina at Chapel HillChapel Hill, NC, United States
| | - Craig A. Buchman
- Department of Otolaryngology—Head and Neck Surgery, Washington UniversitySt. Louis, MO, United States
| |
Collapse
|
48
|
Stilp CE. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech. J Assoc Res Otolaryngol 2017; 18:465-481. [PMID: 28281035 PMCID: PMC5418160 DOI: 10.1007/s10162-017-0615-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 01/30/2017] [Indexed: 10/20/2022] Open
Abstract
Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.
Collapse
Affiliation(s)
- Christian E Stilp
- University of Louisville, 317 Life Sciences Building, Louisville, KY, 40292, USA.
| |
Collapse
|
49
|
Jaekel BN, Newman RS, Goupell MJ. Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:1398-1416. [PMID: 28395319 PMCID: PMC5580678 DOI: 10.1044/2016_jslhr-h-15-0427] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Revised: 05/04/2016] [Accepted: 10/14/2016] [Indexed: 05/29/2023]
Abstract
PURPOSE Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate information could explain some of the variability in this population's speech perception outcomes. METHOD Phonemes with manipulated voice-onset-time (VOT) durations were embedded in sentences with different speech rates. Twenty-three CI and 29 NH participants performed a phoneme identification task. NH participants heard the same unprocessed stimuli as the CI participants or stimuli degraded by a sine vocoder, simulating aspects of CI processing. RESULTS CI participants showed larger rate normalization effects (6.6 ms) than the NH participants (3.7 ms) and had shallower (less reliable) category boundary slopes. NH participants showed similarly shallow slopes when presented acoustically degraded vocoded signals, but an equal or smaller rate effect in response to reductions in available spectral and temporal information. CONCLUSION CI participants can rate normalize, despite their degraded speech input, and show a larger rate effect compared to NH participants. CI participants may particularly rely on rate normalization to better maintain perceptual constancy of the speech signal.
Collapse
Affiliation(s)
- Brittany N. Jaekel
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| | - Rochelle S. Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| | - Matthew J. Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| |
Collapse
|
50
|
Horn DL, Dudley DJ, Dedhia K, Nie K, Drennan WR, Won JH, Rubinstein JT, Werner LA. Effects of age and hearing mechanism on spectral resolution in normal hearing and cochlear-implanted listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:613. [PMID: 28147578 PMCID: PMC5848837 DOI: 10.1121/1.4974203] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 12/21/2016] [Accepted: 01/04/2017] [Indexed: 05/26/2023]
Abstract
Spectral resolution limits speech perception with a cochlear implant (CI) in post-lingually deaf adults. However, the development of spectral resolution in pre-lingually deaf implanted children is not well understood. Acoustic spectral resolution was measured as a function of age (school-age versus adult) in CI and normal-hearing (NH) participants using spectral ripple discrimination (SRD). A 3-alternative forced-choice task was used to obtain SRD thresholds at five ripple depths. Effects of age and hearing method on SRD and spectral modulation transfer function (SMTF) slope (reflecting frequency resolution) and x-intercept (reflecting across-channel intensity resolution) were examined. Correlations between SRD, SMTF parameters, age, and speech perception in noise were studied. Better SRD in NH than CI participants was observed at all depths. SRD thresholds and SMTF slope correlated with speech perception in CI users. When adjusted for floor performance, x-intercept did not correlate with SMTF slope or speech perception. Age and x-intercept correlations were positive and significant in NH but not CI children suggesting that across-channel intensity resolution matures during school-age in NH children. No evidence for maturation of spectral resolution beyond early school-age in pre-lingually deaf implanted CI users was found in the present study.
Collapse
Affiliation(s)
- David L Horn
- Department of Otolaryngology-Head and Neck Surgery, Virginia Merrill Bloedel Hearing Research Center, University of Washington, Box 357923, Seattle, Washington 98195, USA
| | - Daniel J Dudley
- Department of Otolaryngology-Head and Neck Surgery, Virginia Merrill Bloedel Hearing Research Center, University of Washington, Box 357923, Seattle, Washington 98195, USA
| | - Kavita Dedhia
- Department of Otolaryngology-Head and Neck Surgery, Virginia Merrill Bloedel Hearing Research Center, University of Washington, Box 357923, Seattle, Washington 98195, USA
| | - Kaibao Nie
- School of Science, Technology, Engineering and Mathematics, University of Washington, Bothell, Washington 98011, USA
| | - Ward R Drennan
- Department of Otolaryngology-Head and Neck Surgery, Virginia Merrill Bloedel Hearing Research Center, University of Washington, Box 357923, Seattle, Washington 98195, USA
| | - Jong Ho Won
- Department of Otolaryngology-Head and Neck Surgery, Virginia Merrill Bloedel Hearing Research Center, University of Washington, Box 357923, Seattle, Washington 98195, USA
| | - Jay T Rubinstein
- Department of Otolaryngology-Head and Neck Surgery, Virginia Merrill Bloedel Hearing Research Center, University of Washington, Box 357923, Seattle, Washington 98195, USA
| | - Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|