1
|
Li Y, Li Q, Du Y, Wang L, Li L, Wen J, Zheng Y. Consonant aspiration in Mandarin-speaking children: a developmental perspective from perception and production. Front Pediatr 2025; 12:1465454. [PMID: 39882208 PMCID: PMC11776868 DOI: 10.3389/fped.2024.1465454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 12/09/2024] [Indexed: 01/31/2025] Open
Abstract
Introduction This study investigates Mandarin-speaking children's acquisition of aspirated/unaspirated voiceless consonants in terms of perception and production, to track children's developmental profile and explore the factors that may affect their acquisition, as well as the possible association between perception and production. Methods Mandarin-speaking children (N = 95) aged 3-5 and adults (N = 20) participated in (1) a perception test designed based on the minimal pairs of unaspirated/aspirated consonants in the quiet and noisy conditions respectively; (2) a production test where participants produced the target words, with syllable-initial consonants focusing on aspiration and non-aspiration. Six pairs of unaspirated/aspirated consonants in Mandarin were included. Results (1) Children's perception and production accuracy of aspirated and unaspirated consonants increased with age. Five-year-olds achieved high accuracy in the perception under the quiet condition and in the production (over 90%), though not yet adult-like. (2) Noise adversely affected children's perception, with all child groups showing poor performance in the noisy condition. In terms of perception, stops were more challenging to children than affricates, but in terms of production, children performed better on stops. Furthermore, the presence of noise had a greater detrimental effect on the perception of aspirated consonants compared to unaspirated ones. (3) A weak positive correlation was found between children's perception of consonant aspiration in the quiet condition and their production. Discussion The findings indicate that age, aspiration state, and manner of articulation (MOA) would affect children's acquisition of consonant aspiration. Although 5-year-olds have almost acquired aspirated/unaspirated consonants, compared to adults, the perception of consonant aspiration in noise remains a challenge for children.
Collapse
Affiliation(s)
- Yani Li
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Qun Li
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Yihang Du
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Lili Wang
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Lin Li
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Jian Wen
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| | - Yun Zheng
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, China
- Department of Audiology and Speech Language Pathology, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
2
|
DiNino M. Age and masking effects on acoustic cues for vowel categorizationa). JASA EXPRESS LETTERS 2024; 4:060001. [PMID: 38884558 DOI: 10.1121/10.0026371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 05/25/2024] [Indexed: 06/18/2024]
Abstract
Age-related changes in auditory processing may reduce physiological coding of acoustic cues, contributing to older adults' difficulty perceiving speech in background noise. This study investigated whether older adults differed from young adults in patterns of acoustic cue weighting for categorizing vowels in quiet and in noise. All participants relied primarily on spectral quality to categorize /ɛ/ and /æ/ sounds under both listening conditions. However, relative to young adults, older adults exhibited greater reliance on duration and less reliance on spectral quality. These results suggest that aging alters patterns of perceptual cue weights that may influence speech recognition abilities.
Collapse
|
3
|
Symons AE, Holt LL, Tierney AT. Informational masking influences segmental and suprasegmental speech categorization. Psychon Bull Rev 2024; 31:686-696. [PMID: 37658222 PMCID: PMC11061029 DOI: 10.3758/s13423-023-02364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/04/2023] [Indexed: 09/03/2023]
Abstract
Auditory categorization requires listeners to integrate acoustic information from multiple dimensions. Attentional theories suggest that acoustic dimensions that are informative attract attention and therefore receive greater perceptual weight during categorization. However, the acoustic environment is often noisy, with multiple sound sources competing for listeners' attention. Amid these adverse conditions, attentional theories predict that listeners will distribute attention more evenly across multiple dimensions. Here we test this prediction using an informational masking paradigm. In two experiments, listeners completed suprasegmental (focus) and segmental (voicing) speech categorization tasks in quiet or in the presence of competing speech. In both experiments, the target speech consisted of short words or phrases that varied in the extent to which fundamental frequency (F0) and durational information signalled category identity. To isolate effects of informational masking, target and competing speech were presented in opposite ears. Across both experiments, there was substantial individual variability in the relative weighting of the two dimensions. These individual differences were consistent across listening conditions, suggesting that they reflect stable perceptual strategies. Consistent with attentional theories of auditory categorization, listeners who relied on a single primary dimension in quiet shifted towards integrating across multiple dimensions in the presence of competing speech. These findings demonstrate that listeners make greater use of the redundancy present in speech when attentional resources are limited.
Collapse
Affiliation(s)
- A E Symons
- Department of Psychological Sciences, Birkbeck, University of London, London, UK
| | - L L Holt
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, 500 Forbes Avenue, Pittsburgh, PA, USA.
| | - A T Tierney
- Department of Psychological Sciences, Birkbeck, University of London, London, UK
| |
Collapse
|
4
|
Tamati TN, Jebens A, Başkent D. Lexical effects on talker discrimination in adult cochlear implant usersa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1631-1640. [PMID: 38426835 PMCID: PMC10908561 DOI: 10.1121/10.0025011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 02/06/2024] [Accepted: 02/07/2024] [Indexed: 03/02/2024]
Abstract
The lexical and phonological content of an utterance impacts the processing of talker-specific details in normal-hearing (NH) listeners. Adult cochlear implant (CI) users demonstrate difficulties in talker discrimination, particularly for same-gender talker pairs, which may alter the reliance on lexical information in talker discrimination. The current study examined the effect of lexical content on talker discrimination in 24 adult CI users. In a remote AX talker discrimination task, word pairs-produced either by the same talker (ST) or different talkers with the same (DT-SG) or mixed genders (DT-MG)-were either lexically easy (high frequency, low neighborhood density) or lexically hard (low frequency, high neighborhood density). The task was completed in quiet and multi-talker babble (MTB). Results showed an effect of lexical difficulty on talker discrimination, for same-gender talker pairs in both quiet and MTB. CI users showed greater sensitivity in quiet as well as less response bias in both quiet and MTB for lexically easy words compared to lexically hard words. These results suggest that CI users make use of lexical content in same-gender talker discrimination, providing evidence for the contribution of linguistic information to the processing of degraded talker information by adult CI users.
Collapse
Affiliation(s)
- Terrin N Tamati
- Department of Otolaryngology, Vanderbilt University Medical Center, 1215 21st Ave S, Nashville, Tennessee 37232, USA
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Almut Jebens
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
5
|
Peng ZE, Easwar V. Development of amplitude modulation, voice onset time, and consonant identification in noise and reverberation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1071-1085. [PMID: 38341737 DOI: 10.1121/10.0024461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/02/2024] [Indexed: 02/13/2024]
Abstract
Children's speech understanding is vulnerable to indoor noise and reverberation: e.g., from classrooms. It is unknown how they develop the ability to use temporal acoustic cues, specifically amplitude modulation (AM) and voice onset time (VOT), which are important for perceiving distorted speech. Through three experiments, we investigated the typical development of AM depth detection in vowels (experiment I), categorical perception of VOT (experiment II), and consonant identification (experiment III) in quiet and in speech-shaped noise (SSN) and mild reverberation in 6- to 14-year-old children. Our findings suggested that AM depth detection using a naturally produced vowel at the rate of the fundamental frequency was particularly difficult for children and with acoustic distortions. While the VOT cue salience was monotonically attenuated with increasing signal-to-noise ratio of SSN, its utility for consonant discrimination was completely removed even under mild reverberation. The reverberant energy decay in distorting critical temporal cues provided further evidence that may explain the error patterns observed in consonant identification. By 11-14 years of age, children approached adult-like performance in consonant discrimination and identification under adverse acoustics, emphasizing the need for good acoustics for younger children as they develop auditory skills to process distorted speech in everyday listening environments.
Collapse
Affiliation(s)
- Z Ellen Peng
- Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| | | |
Collapse
|
6
|
Buz E, Dwyer NC, Lai W, Watson DG, Gifford RH. Integration of fundamental frequency and voice-onset-time to voicing categorization: Listeners with normal hearing and bimodal hearing configurations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1580. [PMID: 37002096 PMCID: PMC9995168 DOI: 10.1121/10.0017429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 05/18/2023]
Abstract
This study investigates the integration of word-initial fundamental frequency (F0) and voice-onset-time (VOT) in stop voicing categorization for adult listeners with normal hearing (NH) and unilateral cochlear implant (CI) recipients utilizing a bimodal hearing configuration [CI + contralateral hearing aid (HA)]. Categorization was assessed for ten adults with NH and ten adult bimodal listeners, using synthesized consonant stimuli interpolating between /ba/ and /pa/ exemplars with five-step VOT and F0 conditions. All participants demonstrated the expected categorization pattern by reporting /ba/ for shorter VOTs and /pa/ for longer VOTs, with NH listeners showing more use of VOT as a voicing cue than CI listeners in general. When VOT becomes ambiguous between voiced and voiceless stops, NH users make more use of F0 as a cue to voicing than CI listeners, and CI listeners showed greater utilization of initial F0 during voicing identification in their bimodal (CI + HA) condition than in the CI-alone condition. The results demonstrate the adjunctive benefit of acoustic hearing from the non-implanted ear for listening conditions involving spectrotemporally complex stimuli. This finding may lead to the development of a clinically feasible perceptual weighting task that could inform clinicians about bimodal efficacy and the risk-benefit profile associated with bilateral CI recommendation.
Collapse
Affiliation(s)
- Esteban Buz
- Department of Psychology and Human Development, Vanderbilt University, Nashville, Tennessee 37203, USA
| | - Nichole C Dwyer
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Wei Lai
- Department of Psychology and Human Development, Vanderbilt University, Nashville, Tennessee 37203, USA
| | - Duane G Watson
- Department of Psychology and Human Development, Vanderbilt University, Nashville, Tennessee 37203, USA
| | - René H Gifford
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37203, USA
| |
Collapse
|
7
|
Jasmin K, Tierney A, Obasih C, Holt L. Short-term perceptual reweighting in suprasegmental categorization. Psychon Bull Rev 2023; 30:373-382. [PMID: 35915382 PMCID: PMC9971089 DOI: 10.3758/s13423-022-02146-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2022] [Indexed: 11/08/2022]
Abstract
Segmental speech units such as phonemes are described as multidimensional categories whose perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. For example, when speech is altered to create an "accent" in which two acoustic dimensions are correlated in a manner opposite that of long-term experience, the dimension that carries less perceptual weight is down-weighted to contribute less in category decisions. It remains unclear, however, whether this short-term reweighting extends to perception of suprasegmental features that span multiple phonemes, syllables, or words, in part because it has remained debatable whether suprasegmental features are perceived categorically. Here, we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial "accent" in which F0 and duration (established in prior research on English speech as "primary" and "secondary" dimensions, respectively) covaried atypically. When categorizing "accented" speech, listeners rapidly down-weighted the secondary dimension (duration). This result indicates that listeners continually track short-term regularities across speech input and dynamically adjust the weight of acoustic evidence for suprasegmental decisions. Thus, dimension-based statistical learning appears to be a widespread phenomenon in speech perception extending to both segmental and suprasegmental categorization.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychology, Wolfson Building, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | - Lori Holt
- Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Haj-Tas MA, Mahmoud HN, Natour YS. Oral-diadochokinetic rate for healthy young Jordanian adults. SPEECH, LANGUAGE AND HEARING 2022. [DOI: 10.1080/2050571x.2022.2156714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Maisa A. Haj-Tas
- Department of Hearing and Speech Sciences, The University of Jordan, Amman, Jordan
| | - Hana N. Mahmoud
- Department of Hearing and Speech Sciences, The University of Jordan, Amman, Jordan
| | - Yaser S. Natour
- Department of Hearing and Speech Sciences, The University of Jordan, Amman, Jordan
| |
Collapse
|
9
|
Kim Y, Thompson A. An Acoustic-Phonetic Approach to Effects of Face Masks on Speech Intelligibility. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4679-4689. [PMID: 36351244 PMCID: PMC9934909 DOI: 10.1044/2022_jslhr-22-00245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/24/2022] [Accepted: 08/19/2022] [Indexed: 06/03/2023]
Abstract
PURPOSE This study aimed to examine the effects of wearing a face mask on speech acoustics and intelligibility, using an acoustic-phonetic analysis of speech. In addition, the effects of speakers' behavioral modification while wearing a mask were examined. METHOD Fourteen female adults were asked to read a set of words and sentences under three conditions: (a) conversational, mask-off; (b) conversational, mask-on; and (c) clear, mask-on. Seventy listeners rated speech intelligibility using two methods: orthographic transcription and visual analog scale (VAS). Acoustic measures for vowels included duration, first (F1) and second (F2) formant frequency, and intensity ratio of F1/F2. For consonants, spectral moment coefficients and consonant-vowel (CV) boundary (intensity ratio between consonant and vowel) were measured. RESULTS Face masks had a negative impact on speech intelligibility as measured by both intelligibility ratings. However, speech intelligibility was recovered in the clear speech condition for VAS but not for transcription scores. Analysis of orthographic transcription showed that listeners tended to frequently confuse consonants (particularly fricatives, affricates, and stops), rather than vowels in the word-initial position. Acoustic data indicated a significant effect of condition on CV intensity ratio only. CONCLUSIONS Our data demonstrate a negative effect of face masks on speech intelligibility, mainly affecting consonants. However, intelligibility can be enhanced by speaking clearly, likely driven by prosodic alterations.
Collapse
Affiliation(s)
- Yunjung Kim
- School of Communication Science and Disorders, Florida State University, Tallahassee
| | - Austin Thompson
- School of Communication Science and Disorders, Florida State University, Tallahassee
| |
Collapse
|
10
|
Wu YC, Holt LL. Phonetic category activation predicts the direction and magnitude of perceptual adaptation to accented speech. J Exp Psychol Hum Percept Perform 2022; 48:913-925. [PMID: 35849375 PMCID: PMC10236200 DOI: 10.1037/xhp0001037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unfamiliar accents can systematically shift speech acoustics away from community norms and reduce comprehension. Yet, limited exposure improves comprehension. This perceptual adaptation indicates that the mapping from acoustics to speech representations is dynamic, rather than fixed. But, what drives adjustments is debated. Supervised learning accounts posit that activation of an internal speech representation via disambiguating information generates predictions about patterns of speech input typically associated with the representation. When actual input mismatches predictions, the mapping is adjusted. We tested two hypotheses of this account across consonants and vowels as listeners categorized speech conveying an English-like acoustic regularity or an artificial accent. Across conditions, signal manipulations impacted which of two acoustic dimensions best conveyed category identity, and predicted which dimension would exhibit the effects of perceptual adaptation. Moreover, the strength of phonetic category activation, as estimated by categorization responses reliant on the dominant acoustic dimension, predicted the magnitude of adaptation observed across listeners. The results align with predictions of supervised learning accounts, suggesting that perceptual adaptation arises from speech category activation, corresponding predictions about the patterns of acoustic input that align with the category, and adjustments in subsequent speech perception when input mismatches these expectations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
11
|
Lou Q, Wang X, Jiang L, Wang G, Chen Y, Liu Q. Subjective and Objective Evaluation of Speech in Adult Patients with Unrepaired Cleft Palate. J Craniofac Surg 2022; 33:e528-e532. [PMID: 35175986 DOI: 10.1097/scs.0000000000008567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/25/2022] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVE To explore the speech outcomes of adult patients through subjective perception evaluation and objective acoustic analysis, and to compare the differences in pronunciation characteristics between speakers with adult patients with unrepaired cleft palate and their non-cleft peers. PARTICIPANTS AND INTERVENTION Subjective evaluation indicators included speech intelligibility, nasality, and consonant missing rate, whereas objective acoustic parameters included normalized vowel formants, voice onset time, and the analysis of three-dimensional spectrogram and spectrum, were carried out on speech samples produced by 2 groups of speakers: (a) speakers with unrepaired cleft palate (n = 65, mean age = 25.1 years) and (b) typical speakers (n = 30, mean age = 23.7 years). RESULTS Compared with typical speakers, individuals with unrepaired cleft palate exhibited a lower speech intelligibility with higher nasality and consonant missing rate, the missing rate is highest for the 6 consonants syllables. The acoustic parameters are mainly manifested as differences in vowel formants and voice onset time. CONCLUSIONS The results revealed important acoustical differences between adult patients with unrepaired cleft palate and typical speakers. The trend of spectral deviation may have contributed to the difficulty in producing pressure vowels and aspirated consonants in individuals with speech disorders related to cleft palate.
Collapse
Affiliation(s)
- Qun Lou
- Department of Oral and Maxillofacial Surgery, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology and Shanghai Research Institute of Stomatology, Shanghai, China
| | | | | | | | | | | |
Collapse
|
12
|
Symons AE, Dick F, Tierney AT. Dimension-selective attention and dimensional salience modulate cortical tracking of acoustic dimensions. Neuroimage 2021; 244:118544. [PMID: 34492294 DOI: 10.1016/j.neuroimage.2021.118544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/19/2021] [Accepted: 08/31/2021] [Indexed: 11/17/2022] Open
Abstract
Some theories of auditory categorization suggest that auditory dimensions that are strongly diagnostic for particular categories - for instance voice onset time or fundamental frequency in the case of some spoken consonants - attract attention. However, prior cognitive neuroscience research on auditory selective attention has largely focused on attention to simple auditory objects or streams, and so little is known about the neural mechanisms that underpin dimension-selective attention, or how the relative salience of variations along these dimensions might modulate neural signatures of attention. Here we investigate whether dimensional salience and dimension-selective attention modulate the cortical tracking of acoustic dimensions. In two experiments, participants listened to tone sequences varying in pitch and spectral peak frequency; these two dimensions changed at different rates. Inter-trial phase coherence (ITPC) and amplitude of the EEG signal at the frequencies tagged to pitch and spectral changes provided a measure of cortical tracking of these dimensions. In Experiment 1, tone sequences varied in the size of the pitch intervals, while the size of spectral peak intervals remained constant. Cortical tracking of pitch changes was greater for sequences with larger compared to smaller pitch intervals, with no difference in cortical tracking of spectral peak changes. In Experiment 2, participants selectively attended to either pitch or spectral peak. Cortical tracking was stronger in response to the attended compared to unattended dimension for both pitch and spectral peak. These findings suggest that attention can enhance the cortical tracking of specific acoustic dimensions rather than simply enhancing tracking of the auditory object as a whole.
Collapse
Affiliation(s)
- Ashley E Symons
- Department of Psychological Sciences, Birkbeck College, University of London UK.
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck College, University of London UK; Division of Psychology & Language Sciences, University College London UK
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London UK
| |
Collapse
|
13
|
Viswanathan V, Shinn-Cunningham BG, Heinz MG. Temporal fine structure influences voicing confusions for consonant identification in multi-talker babble. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2664. [PMID: 34717498 PMCID: PMC8514254 DOI: 10.1121/10.0006527] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 09/07/2021] [Accepted: 09/09/2021] [Indexed: 05/17/2023]
Abstract
To understand the mechanisms of speech perception in everyday listening environments, it is important to elucidate the relative contributions of different acoustic cues in transmitting phonetic content. Previous studies suggest that the envelope of speech in different frequency bands conveys most speech content, while the temporal fine structure (TFS) can aid in segregating target speech from background noise. However, the role of TFS in conveying phonetic content beyond what envelopes convey for intact speech in complex acoustic scenes is poorly understood. The present study addressed this question using online psychophysical experiments to measure the identification of consonants in multi-talker babble for intelligibility-matched intact and 64-channel envelope-vocoded stimuli. Consonant confusion patterns revealed that listeners had a greater tendency in the vocoded (versus intact) condition to be biased toward reporting that they heard an unvoiced consonant, despite envelope and place cues being largely preserved. This result was replicated when babble instances were varied across independent experiments, suggesting that TFS conveys voicing information beyond what is conveyed by envelopes for intact speech in babble. Given that multi-talker babble is a masker that is ubiquitous in everyday environments, this finding has implications for the design of assistive listening devices such as cochlear implants.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
14
|
Souza PE, Ellis G, Marks K, Wright R, Gallun F. Does the Speech Cue Profile Affect Response to Amplitude Envelope Distortion? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2053-2069. [PMID: 34019777 PMCID: PMC8740712 DOI: 10.1044/2021_jslhr-20-00481] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 11/13/2020] [Accepted: 02/11/2021] [Indexed: 06/12/2023]
Abstract
Purpose A broad area of interest to our group is to understand the consequences of the "cue profile" (a measure of how well a listener can utilize audible temporal and/or spectral cues for listening scenarios in which a subset of cues is distorted. The study goal was to determine if listeners whose cue profile indicated that they primarily used temporal cues for recognition would respond differently to speech-envelope distortion than listeners who utilized both spectral and temporal cues. Method Twenty-five adults with sensorineural hearing loss participated in the study. The listener's cue profile was measured by analyzing identification patterns for a set of synthetic syllables in which envelope rise time and formant transitions were varied. A linear discriminant analysis quantified the relative contributions of spectral and temporal cues to identification patterns. Low-context sentences in noise were processed with time compression, wide-dynamic range compression, or a combination of time compression and wide-dynamic range compression to create a range of speech-envelope distortions. An acoustic metric, a modified version of the Spectral Correlation Index, was calculated to quantify envelope distortion. Results A binomial generalized linear mixed-effects model indicated that envelope distortion, the cue profile, the interaction between envelope distortion and the cue profile, and the pure-tone average were significant predictors of sentence recognition. Conclusions The listeners with good perception of spectro-temporal contrasts were more resilient to the detrimental effects of envelope compression than listeners who used temporal cues to a greater extent. The cue profile may provide information about individual listening that can direct choice of hearing aid parameters, especially those parameters that affect the speech envelope.
Collapse
|
15
|
Bourke JD, Todd J. Acoustics versus linguistics? Context is Part and Parcel to lateralized processing of the parts and parcels of speech. Laterality 2021; 26:725-765. [PMID: 33726624 DOI: 10.1080/1357650x.2021.1898415] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The purpose of this review is to provide an accessible exploration of key considerations of lateralization in speech and non-speech perception using clear and defined language. From these considerations, the primary arguments for each side of the linguistics versus acoustics debate are outlined and explored in context of emerging integrative theories. This theoretical approach entails a perspective that linguistic and acoustic features differentially contribute to leftward bias, depending on the given context. Such contextual factors include stimulus parameters and variables of stimulus presentation (e.g., noise/silence and monaural/binaural) and variances in individuals (sex, handedness, age, and behavioural ability). Discussion of these factors and their interaction is also aimed towards providing an outline of variables that require consideration when developing and reviewing methodology of acoustic and linguistic processing laterality studies. Thus, there are three primary aims in the present paper: (1) to provide the reader with key theoretical perspectives from the acoustics/linguistics debate and a synthesis of the two viewpoints, (2) to highlight key caveats for generalizing findings regarding predominant models of speech laterality, and (3) to provide a practical guide for methodological control using predominant behavioural measures (i.e., gap detection and dichotic listening tasks) and/or neurophysiological measures (i.e., mismatch negativity) of speech laterality.
Collapse
Affiliation(s)
- Jesse D Bourke
- School of Psychology, University Drive, Callaghan, NSW 2308, Australia
| | - Juanita Todd
- School of Psychology, University Drive, Callaghan, NSW 2308, Australia
| |
Collapse
|
16
|
Koury GV, da S. Araújo FC, Costa KM, da Silva Filho M. A Digital Filter-Based Method for Diagnosing Speech Comprehension Deficits. Mayo Clin Proc Innov Qual Outcomes 2021; 5:241-252. [PMID: 33997624 PMCID: PMC8105531 DOI: 10.1016/j.mayocpiqo.2020.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Objective To improve the diagnostic efficiency of current tests for auditory processing disorders (APDs) by creating new test signals using digital filtering methods. Methods We conducted a prospective study from August 1, 2014, to August 31, 2019, using 3 low speech redundancy tests with novel test signals that we created with specially designed digital filters: the binaural resynthesis test and the low pass and high pass filtered speech tests. We validated and optimized these new tests, then applied them to healthy individuals across different age groups to examine how age affected performance and to children with APD before and after acoustically controlled auditory training (ACAT) to assess clinical improvement after treatment. Results We found a progressive increase in performance accuracy with less restrictive filters (P<.001) and with increasing age for all tests (P<.001). Our results suggest that binaural resynthesis and auditory closure mature at similar rates. We also demonstrate that the new tests can be used for the diagnosis of APD and for the monitoring of ACAT effects. Interestingly, we found that patients having the most severe deficits also benefited the most from ACAT (P<.001). Conclusion We introduce a method that substantially improves current diagnostic tools for APD. In addition, we provide information on auditory processing maturation in normal development and validate that our method can detect APD-related deficits and ACAT-induced improvements in auditory processing.
Collapse
Key Words
- AC, auditory closure
- ACAT, acoustically controlled auditory training
- APD, auditory processing disorder
- BR, binaural resynthesis
- BRT, binaural resynthesis test
- FST, filtered speech test
- HP, high pass
- HPFST, high pass filtered speech test
- L1, list 1
- L2, list 2
- LP, low pass
- LPFST, low pass filtered speech test
Collapse
Affiliation(s)
- Gisele V.H. Koury
- Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
- Correspondence: Address to Gisele V.H. Koury, MSc, Institute of Biological Sciences, Federal University of Pará, Neuroengineering Laboratory, Rua Augusto Correa No. 1, Guamá, Belém, Pará, Brazil 66075-110
| | | | - Kauê M. Costa
- National Institute on Drug Abuse Intramural Research Program, National Institutes of Health, Baltimore, MD
| | | |
Collapse
|
17
|
Jasmin K, Dick F, Holt LL, Tierney A. Tailored perception: Individuals' speech and music perception strategies fit their perceptual abilities. J Exp Psychol Gen 2020; 149:914-934. [PMID: 31589067 PMCID: PMC7133494 DOI: 10.1037/xge0000688] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/12/2019] [Indexed: 01/09/2023]
Abstract
Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, for example, pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual strategies), but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N = 11) categorize linguistic stimuli similarly to controls (N = 11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues-even when pitch differences in the stimuli are large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N = 16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g., vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Fred Dick
- Department of Psychological Sciences
| | | | | |
Collapse
|
18
|
Son G. Korean-speaking children's perceptual development in multidimensional acoustic space. JOURNAL OF CHILD LANGUAGE 2020; 47:579-599. [PMID: 31722761 DOI: 10.1017/s0305000919000692] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This study investigated how Korean toddlers' perception of stop categories develops in the acoustic dimensions of VOT and F0. To examine the developmental trajectory of VOT and F0 in toddlers' perceptual space, a perceptual identification test with natural and synthesized sound stimuli was conducted with 58 Korean monolingual children (aged 2-4 years). The results revealed that toddlers' perceptual mapping functions on VOT mainly in the high-pitch environment, resulting in more successful perceptual accuracy in fortis or aspirated stops than in lenis stops. F0 development is correlated with the perceptual distinction of lenis from aspirated stops, but no consistent categorical perception for F0 was found before four years of age. The findings suggest that multi-parametric control in perceptual development guides an acquisition ordering of Korean stop phonemes and that tonal development is significantly related to the acquisition of Korean phonemic contrasts.
Collapse
Affiliation(s)
- Gayeon Son
- University of Pennsylvania, Department of Linguistics, Philadelphia, PA, USA, and Kwangwoon University, Department of English Language and Literature, Seoul, Korea
| |
Collapse
|
19
|
Winn MB. Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:852. [PMID: 32113256 DOI: 10.1121/10.0000692] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/22/2020] [Indexed: 06/10/2023]
Abstract
Voice onset time (VOT) is an acoustic property of stop consonants that is commonly manipulated in studies of phonetic perception. This paper contains a thorough description of the "progressive cutback and replacement" method of VOT manipulation, and comparison with other VOT manipulation techniques. Other acoustic properties that covary with VOT-such as fundamental frequency and formant transitions-are also discussed, along with considerations for testing VOT perception and its relationship to various other measures of auditory temporal or spectral processing. An implementation of the progressive cutback and replacement method in the Praat scripting language is presented, which is suitable for modifying natural speech for perceptual experiments involving VOT and/or related covarying F0 and intensity cues. Justifications are provided for the stimulus design choices and constraints implemented in the script.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
20
|
Schertz J, Clare EJ. Phonetic cue weighting in perception and production. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1521. [DOI: 10.1002/wcs.1521] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 07/12/2019] [Accepted: 08/25/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Jessamyn Schertz
- University of Toronto Department of Language Studies Mississauga Ontario Canada
| | - Emily J. Clare
- University of Toronto Department of Linguistics Toronto Ontario Canada
| |
Collapse
|
21
|
Schertz J, Chow CTY, Kamal NSN. The influence of tone language experience and speech style on the use of intonation in language discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:EL58. [PMID: 31370592 DOI: 10.1121/1.5117167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 06/18/2019] [Indexed: 06/10/2023]
Abstract
This work tests whether listeners' use of suprasegmental information in speech perception is modulated by language background and speech style. Native Mandarin (tone language) and Malay (non-tone language) listeners completed an AX language discrimination task with four levels of signal degradation and two speech styles. Listeners in both groups showed more benefit from pitch information in read than in spontaneous speech. Mandarin listeners showed a greater benefit than Malay listeners from the inclusion of f0 information in a segmentally degraded signal, suggesting that experience with lexical tone may extend to increased attention and/or sensitivity to phrase-level pitch contours.
Collapse
Affiliation(s)
- Jessamyn Schertz
- Department of Language Studies, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, , ,
| | - Crystal Tze Ying Chow
- Department of Language Studies, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, , ,
| | - Nur Sakinah Nor Kamal
- Department of Language Studies, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, , ,
| |
Collapse
|
22
|
Deroche MLD, Lu HP, Lin YS, Chatterjee M, Peng SC. Processing of Acoustic Information in Lexical Tone Production and Perception by Pediatric Cochlear Implant Recipients. Front Neurosci 2019; 13:639. [PMID: 31281237 PMCID: PMC6596315 DOI: 10.3389/fnins.2019.00639] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 06/03/2019] [Indexed: 11/13/2022] Open
Abstract
Purpose: This study examined the utilization of multiple types of acoustic information in lexical tone production and perception by pediatric cochlear implant (CI) recipients who are native speakers of Mandarin Chinese. Methods: Lexical tones were recorded from CI recipients and their peers with normal hearing (NH). Each participant was asked to produce a disyllabic word, yan jing, with which the first syllable was pronounced as Tone 3 (a low dipping tone) while the second syllable was pronounced as Tone 1 (a high level tone, meaning "eyes") or as Tone 4 (a high falling tone, meaning "eyeglasses"). In addition, a parametric manipulation in fundamental frequency (F0) and duration of Tones 1 and 4 used in a lexical tone recognition task in Peng et al. (2017) was adopted to evaluate the perceptual reliance on each dimension. Results: Mixed-effect analyses of duration, intensity, and F0 cues revealed that NH children focused exclusively on marking distinct F0 contours, while CI participants shortened Tone 4 or prolonged Tone 1 to enhance their contrast. In line with these production strategies, NH children relied primarily on F0 cues to identify the two tones, whereas CI children showed greater reliance on duration cues. Moreover, CI participants who placed greater perceptual weight on duration cues also tended to exhibit smaller changes in their F0 production. Conclusion: Pediatric CI recipients appear to contrast the secondary acoustic dimension (duration) in addition to F0 contours for both lexical tone production and perception. These findings suggest that perception and production strategies of lexical tones are well coupled in this pediatric CI population.
Collapse
Affiliation(s)
| | | | - Yung-Song Lin
- Chi-Mei Medical Center, Tainan, Taiwan.,Taipei Medical University, Taipei, Taiwan
| | | | - Shu-Chen Peng
- United States Food and Drug Administration, Silver Spring, MD, United States
| |
Collapse
|
23
|
Toscano JC, Lansing CR. Age-Related Changes in Temporal and Spectral Cue Weights in Speech. LANGUAGE AND SPEECH 2019; 62:61-79. [PMID: 29103359 DOI: 10.1177/0023830917737112] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Listeners weight acoustic cues in speech according to their reliability, but few studies have examined how cue weights change across the lifespan. Previous work has suggested that older adults have deficits in auditory temporal discrimination, which could affect the reliability of temporal phonetic cues, such as voice onset time (VOT), and in turn, impact speech perception in real-world listening environments. We addressed this by examining younger and older adults' use of VOT and onset F0 (a secondary phonetic cue) for voicing judgments (e.g., /b/ vs. /p/), using both synthetic and naturally produced speech. We found age-related differences in listeners' use of the two voicing cues, such that older adults relied more heavily on onset F0 than younger adults, even though this cue is less reliable in American English. These results suggest that phonetic cue weights continue to change across the lifespan.
Collapse
|
24
|
Kapnoula EC, Winn MB, Kong EJ, Edwards J, McMurray B. Evaluating the sources and functions of gradiency in phoneme categorization: An individual differences approach. J Exp Psychol Hum Percept Perform 2017; 43:1594-1611. [PMID: 28406683 PMCID: PMC5561468 DOI: 10.1037/xhp0000410] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
During spoken language comprehension listeners transform continuous acoustic cues into categories (e.g., /b/ and /p/). While long-standing research suggests that phonetic categories are activated in a gradient way, there are also clear individual differences in that more gradient categorization has been linked to various communication impairments such as dyslexia and specific language impairments (Joanisse, Manis, Keating, & Seidenberg, 2000; López-Zamora, Luque, Álvarez, & Cobos, 2012; Serniclaes, Van Heghe, Mousty, Carré, & Sprenger-Charolles, 2004; Werker & Tees, 1987). Crucially, most studies have used 2-alternative forced choice (2AFC) tasks to measure the sharpness of between-category boundaries. Here we propose an alternative paradigm that allows us to measure categorization gradiency in a more direct way. Furthermore, we follow an individual differences approach to (a) link this measure of gradiency to multiple cue integration, (b) explore its relationship to a set of other cognitive processes, and (c) evaluate its role in individuals' ability to perceive speech in noise. Our results provide validation for this new method of assessing phoneme categorization gradiency and offer preliminary insights into how different aspects of speech perception may be linked to each other and to more general cognitive processes. (PsycINFO Database Record
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Department of Psychological and Brain Sciences, DeLTA Center, University of Iowa
| | - Matthew B Winn
- Department of Speech and Hearing Sciences, University of Washington
| | | | - Jan Edwards
- Department of Communication Sciences and Disorders, Waisman Center, University of Wisconsin-Madison
| | - Bob McMurray
- Department of Psychological and Brain Sciences, DeLTA Center, University of Iowa
| |
Collapse
|
25
|
Peng SC, Lu HP, Lu N, Lin YS, Deroche MLD, Chatterjee M. Processing of Acoustic Cues in Lexical-Tone Identification by Pediatric Cochlear-Implant Recipients. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:1223-1235. [PMID: 28388709 PMCID: PMC5755546 DOI: 10.1044/2016_jslhr-s-16-0048] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Revised: 07/19/2016] [Accepted: 10/27/2016] [Indexed: 05/23/2023]
Abstract
PURPOSE The objective was to investigate acoustic cue processing in lexical-tone recognition by pediatric cochlear-implant (CI) recipients who are native Mandarin speakers. METHOD Lexical-tone recognition was assessed in pediatric CI recipients and listeners with normal hearing (NH) in 2 tasks. In Task 1, participants identified naturally uttered words that were contrastive in lexical tones. For Task 2, a disyllabic word (yanjing) was manipulated orthogonally, varying in fundamental-frequency (F0) contours and duration patterns. Participants identified each token with the second syllable jing pronounced with Tone 1 (a high level tone) as eyes or with Tone 4 (a high falling tone) as eyeglasses. RESULTS CI participants' recognition accuracy was significantly lower than NH listeners' in Task 1. In Task 2, CI participants' reliance on F0 contours was significantly less than that of NH listeners; their reliance on duration patterns, however, was significantly higher than that of NH listeners. Both CI and NH listeners' performance in Task 1 was significantly correlated with their reliance on F0 contours in Task 2. CONCLUSION For pediatric CI recipients, lexical-tone recognition using naturally uttered words is primarily related to their reliance on F0 contours, although duration patterns may be used as an additional cue.
Collapse
Affiliation(s)
- Shu-Chen Peng
- Center for Devices and Radiological Health, United States Food and Drug Administration, Silver Spring, MD
| | | | - Nelson Lu
- Center for Devices and Radiological Health, United States Food and Drug Administration, Silver Spring, MD
| | - Yung-Song Lin
- Chi-Mei Medical Center, Tainan, Taiwan
- Taipei Medical University, Taiwan
| | | | | |
Collapse
|
26
|
Jaekel BN, Newman RS, Goupell MJ. Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:1398-1416. [PMID: 28395319 PMCID: PMC5580678 DOI: 10.1044/2016_jslhr-h-15-0427] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Revised: 05/04/2016] [Accepted: 10/14/2016] [Indexed: 05/29/2023]
Abstract
PURPOSE Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate information could explain some of the variability in this population's speech perception outcomes. METHOD Phonemes with manipulated voice-onset-time (VOT) durations were embedded in sentences with different speech rates. Twenty-three CI and 29 NH participants performed a phoneme identification task. NH participants heard the same unprocessed stimuli as the CI participants or stimuli degraded by a sine vocoder, simulating aspects of CI processing. RESULTS CI participants showed larger rate normalization effects (6.6 ms) than the NH participants (3.7 ms) and had shallower (less reliable) category boundary slopes. NH participants showed similarly shallow slopes when presented acoustically degraded vocoded signals, but an equal or smaller rate effect in response to reductions in available spectral and temporal information. CONCLUSION CI participants can rate normalize, despite their degraded speech input, and show a larger rate effect compared to NH participants. CI participants may particularly rely on rate normalization to better maintain perceptual constancy of the speech signal.
Collapse
Affiliation(s)
- Brittany N. Jaekel
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| | - Rochelle S. Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| | - Matthew J. Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park
| |
Collapse
|
27
|
Fogerty D, Bologna WJ, Ahlstrom JB, Dubno JR. Simultaneous and forward masking of vowels and stop consonants: Effects of age, hearing loss, and spectral shaping. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:1133. [PMID: 28253707 PMCID: PMC5848836 DOI: 10.1121/1.4976082] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 01/11/2017] [Accepted: 01/14/2017] [Indexed: 05/13/2023]
Abstract
Fluctuating noise, common in everyday environments, has the potential to mask acoustic cues important for speech recognition. This study examined the extent to which acoustic cues for perception of vowels and stop consonants differ in their susceptibility to simultaneous and forward masking. Younger normal-hearing, older normal-hearing, and older hearing-impaired adults identified initial and final consonants or vowels in noise-masked syllables that had been spectrally shaped. The amount of shaping was determined by subjects' audiometric thresholds. A second group of younger adults with normal hearing was tested with spectral shaping determined by the mean audiogram of the hearing-impaired group. Stimulus timing ensured that the final 10, 40, or 100 ms of the syllable occurred after the masker offset. Results demonstrated that participants benefited from short temporal delays between the noise and speech for vowel identification, but required longer delays for stop consonant identification. Older adults with normal and impaired hearing, with sufficient audibility, required longer delays to obtain performance equivalent to that of the younger adults. Overall, these results demonstrate that in forward masking conditions, younger listeners can successfully identify vowels during short temporal intervals (i.e., one unmasked pitch period), with longer durations required for consonants and for older adults.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1224 Sumter Street, Suite 300, Columbia, South Carolina 29208, USA
| | - William J Bologna
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| |
Collapse
|
28
|
Kong YY, Winn MB, Poellmann K, Donaldson GS. Discriminability and Perceptual Saliency of Temporal and Spectral Cues for Final Fricative Consonant Voicing in Simulated Cochlear-Implant and Bimodal Hearing. Trends Hear 2016; 20:20/0/2331216516652145. [PMID: 27317666 PMCID: PMC5562340 DOI: 10.1177/2331216516652145] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Multiple redundant acoustic cues can contribute to the perception of a single phonemic contrast. This study investigated the effect of spectral degradation on the discriminability and perceptual saliency of acoustic cues for identification of word-final fricative voicing in "loss" versus "laws", and possible changes that occurred when low-frequency acoustic cues were restored. Three acoustic cues that contribute to the word-final /s/-/z/ contrast (first formant frequency [F1] offset, vowel-consonant duration ratio, and consonant voicing duration) were systematically varied in synthesized words. A discrimination task measured listeners' ability to discriminate differences among stimuli within a single cue dimension. A categorization task examined the extent to which listeners make use of a given cue to label a syllable as "loss" versus "laws" when multiple cues are available. Normal-hearing listeners were presented with stimuli that were either unprocessed, processed with an eight-channel noise-band vocoder to approximate spectral degradation in cochlear implants, or low-pass filtered. Listeners were tested in four listening conditions: unprocessed, vocoder, low-pass, and a combined vocoder + low-pass condition that simulated bimodal hearing. Results showed a negative impact of spectral degradation on F1 cue discrimination and a trading relation between spectral and temporal cues in which listeners relied more heavily on the temporal cues for "loss-laws" identification when spectral cues were degraded. Furthermore, the addition of low-frequency fine-structure cues in simulated bimodal hearing increased the perceptual saliency of the F1 cue for "loss-laws" identification compared with vocoded speech. Findings suggest an interplay between the quality of sensory input and cue importance.
Collapse
Affiliation(s)
- Ying-Yee Kong
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, USA
| | - Matthew B Winn
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Katja Poellmann
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, USA
| | - Gail S Donaldson
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, FL, USA
| |
Collapse
|
29
|
Deroche MLD, Kulkarni AM, Christensen JA, Limb CJ, Chatterjee M. Deficits in the Sensitivity to Pitch Sweeps by School-Aged Children Wearing Cochlear Implants. Front Neurosci 2016; 10:73. [PMID: 26973451 PMCID: PMC4776214 DOI: 10.3389/fnins.2016.00073] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 02/17/2016] [Indexed: 11/13/2022] Open
Abstract
Sensitivity to static changes in pitch has been shown to be poorer in school-aged children wearing cochlear implants (CIs) than children with normal hearing (NH), but it is unclear whether this is also the case for dynamic changes in pitch. Yet, dynamically changing pitch has considerable ecological relevance in terms of natural speech, particularly aspects such as intonation, emotion, or lexical tone information. Twenty one children with NH and 23 children wearing a CI participated in this study, along with 18 NH adults and 6 CI adults for comparison. Listeners with CIs used their clinically assigned settings with envelope-based coding strategies. Percent correct was measured in one- or three-interval two-alternative forced choice tasks, for the direction or discrimination of harmonic complexes based on a linearly rising or falling fundamental frequency. Sweep rates were adjusted per subject, in a logarithmic scale, so as to cover the full extent of the psychometric function. Data for up- and down-sweeps were fitted separately, using a maximum-likelihood technique. Fits were similar for up- and down-sweeps in the discrimination task, but diverged in the direction task because psychometric functions for down-sweeps were very shallow. Hits and false alarms were then converted into d′ and beta values, from which a threshold was extracted at a d′ of 0.77. Thresholds were very consistent between the two tasks and considerably higher (worse) for CI listeners than for their NH peers. Thresholds were also higher for children than adults. Factors such as age at implantation, age at profound hearing loss, and duration of CI experience did not play any major role in this sensitivity. Thresholds of dynamic pitch sensitivity (in either task) also correlated with thresholds for static pitch sensitivity and with performance in tasks related to speech prosody.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Centre for Research on Brain, Language and Music, McGill University Montreal, QC, Canada
| | - Aditya M Kulkarni
- Auditory Prostheses and Perception Laboratory, Boys Town National Research Hospital Omaha, NE, USA
| | - Julie A Christensen
- Auditory Prostheses and Perception Laboratory, Boys Town National Research Hospital Omaha, NE, USA
| | - Charles J Limb
- Department of Otolaryngology - Head and Neck Surgery, University of California San Francisco School of Medicine San Francisco, CA, USA
| | - Monita Chatterjee
- Auditory Prostheses and Perception Laboratory, Boys Town National Research Hospital Omaha, NE, USA
| |
Collapse
|
30
|
Chatterjee M, Zion DJ, Deroche ML, Burianek BA, Limb CJ, Goren AP, Kulkarni AM, Christensen JA. Voice emotion recognition by cochlear-implanted children and their normally-hearing peers. Hear Res 2015; 322:151-62. [PMID: 25448167 PMCID: PMC4615700 DOI: 10.1016/j.heares.2014.10.003] [Citation(s) in RCA: 105] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2014] [Revised: 08/27/2014] [Accepted: 10/06/2014] [Indexed: 10/24/2022]
Abstract
Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled .
Collapse
Affiliation(s)
- Monita Chatterjee
- Auditory Prostheses & Perception Lab., Boys Town National Research Hospital, 555 N 30th St, Omaha, NE 68131, USA.
| | - Danielle J Zion
- Department of Hearing & Speech Sciences, University of Maryland, 0100 LeFrak Hall, College Park, MD 20742, USA
| | - Mickael L Deroche
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, MD, USA
| | - Brooke A Burianek
- Auditory Prostheses & Perception Lab., Boys Town National Research Hospital, 555 N 30th St, Omaha, NE 68131, USA
| | - Charles J Limb
- Department of Otolaryngology, Johns Hopkins University School of Medicine, 818 Ross Research Building, 720 Rutland Avenue, Baltimore, MD, USA
| | - Alison P Goren
- Auditory Prostheses & Perception Lab., Boys Town National Research Hospital, 555 N 30th St, Omaha, NE 68131, USA; Department of Hearing & Speech Sciences, University of Maryland, 0100 LeFrak Hall, College Park, MD 20742, USA
| | - Aditya M Kulkarni
- Auditory Prostheses & Perception Lab., Boys Town National Research Hospital, 555 N 30th St, Omaha, NE 68131, USA
| | - Julie A Christensen
- Auditory Prostheses & Perception Lab., Boys Town National Research Hospital, 555 N 30th St, Omaha, NE 68131, USA
| |
Collapse
|
31
|
Winn MB, Litovsky RY. Using speech sounds to test functional spectral resolution in listeners with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:1430-1442. [PMID: 25786954 PMCID: PMC4368591 DOI: 10.1121/1.4908308] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 11/26/2014] [Accepted: 01/20/2015] [Indexed: 05/31/2023]
Abstract
In this study, spectral properties of speech sounds were used to test functional spectral resolution in people who use cochlear implants (CIs). Specifically, perception of the /ba/-/da/ contrast was tested using two spectral cues: Formant transitions (a fine-resolution cue) and spectral tilt (a coarse-resolution cue). Higher weighting of the formant cues was used as an index of better spectral cue perception. Participants included 19 CI listeners and 10 listeners with normal hearing (NH), for whom spectral resolution was explicitly controlled using a noise vocoder with variable carrier filter widths to simulate electrical current spread. Perceptual weighting of the two cues was modeled with mixed-effects logistic regression, and was found to systematically vary with spectral resolution. The use of formant cues was greatest for NH listeners for unprocessed speech, and declined in the two vocoded conditions. Compared to NH listeners, CI listeners relied less on formant transitions, and more on spectral tilt. Cue-weighting results showed moderately good correspondence with word recognition scores. The current approach to testing functional spectral resolution uses auditory cues that are known to be important for speech categorization, and can thus potentially serve as the basis upon which CI processing strategies and innovations are tested.
Collapse
Affiliation(s)
- Matthew B Winn
- Waisman Center and Department of Surgery, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705
| | - Ruth Y Litovsky
- Waisman Center, Department of Communication Sciences and Disorders and Department of Surgery, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705
| |
Collapse
|
32
|
Chatterjee M, Kulkarni AM. Sensitivity to pulse phase duration in cochlear implant listeners: effects of stimulation mode. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:829-40. [PMID: 25096116 PMCID: PMC4144184 DOI: 10.1121/1.4884773] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2013] [Revised: 06/09/2014] [Accepted: 06/11/2014] [Indexed: 05/23/2023]
Abstract
The objective of this study was to investigate charge-integration at threshold by cochlear implant listeners using pulse train stimuli in different stimulation modes (monopolar, bipolar, tripolar). The results partially confirmed and extended the findings of previous studies conducted in animal models showing that charge-integration depends on the stimulation mode. The primary overall finding was that threshold vs pulse phase duration functions had steeper slopes in monopolar mode and shallower slopes in more spatially restricted modes. While the result was clear-cut in eight users of the Cochlear Corporation(TM) device, the findings with the six user of the Advanced Bionics(TM) device who participated were less consistent. It is likely that different stimulation modes excite different neuronal populations and/or sites of excitation on the same neuron (e.g., peripheral process vs central axon). These differences may influence not only charge integration but possibly also temporal dynamics at suprathreshold levels and with more speech-relevant stimuli. Given the present interest in focused stimulation modes, these results have implications for cochlear implant speech processor design and protocols used to map acoustic amplitude to electric stimulation parameters.
Collapse
Affiliation(s)
- Monita Chatterjee
- Boys Town National Research Hospital, 555 N 30th Street, Omaha, Nebraska 68131
| | - Aditya M Kulkarni
- Boys Town National Research Hospital, 555 N 30th Street, Omaha, Nebraska 68131
| |
Collapse
|