1
|
Informational masking influences segmental and suprasegmental speech categorization. Psychon Bull Rev 2024; 31:686-696. [PMID: 37658222 PMCID: PMC11061029 DOI: 10.3758/s13423-023-02364-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/04/2023] [Indexed: 09/03/2023]
Abstract
Auditory categorization requires listeners to integrate acoustic information from multiple dimensions. Attentional theories suggest that acoustic dimensions that are informative attract attention and therefore receive greater perceptual weight during categorization. However, the acoustic environment is often noisy, with multiple sound sources competing for listeners' attention. Amid these adverse conditions, attentional theories predict that listeners will distribute attention more evenly across multiple dimensions. Here we test this prediction using an informational masking paradigm. In two experiments, listeners completed suprasegmental (focus) and segmental (voicing) speech categorization tasks in quiet or in the presence of competing speech. In both experiments, the target speech consisted of short words or phrases that varied in the extent to which fundamental frequency (F0) and durational information signalled category identity. To isolate effects of informational masking, target and competing speech were presented in opposite ears. Across both experiments, there was substantial individual variability in the relative weighting of the two dimensions. These individual differences were consistent across listening conditions, suggesting that they reflect stable perceptual strategies. Consistent with attentional theories of auditory categorization, listeners who relied on a single primary dimension in quiet shifted towards integrating across multiple dimensions in the presence of competing speech. These findings demonstrate that listeners make greater use of the redundancy present in speech when attentional resources are limited.
Collapse
|
2
|
Lexical Stress Identification in Cochlear Implant-Simulated Speech by Non-Native Listeners. LANGUAGE AND SPEECH 2024:238309231222207. [PMID: 38282517 DOI: 10.1177/00238309231222207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
This study investigates whether a presumed difference in the perceptibility of cues to lexical stress in spectro-temporally degraded simulated cochlear implant (CI) speech affects how listeners weight these cues during a lexical stress identification task, specifically in their non-native language. Previous research suggests that in English, listeners predominantly rely on a reduction in vowel quality as a cue to lexical stress. In Dutch, changes in the fundamental frequency (F0) contour seem to have a greater functional weight than the vowel quality contrast. Generally, non-native listeners use the cue-weighting strategies from their native language in the non-native language. Moreover, few studies have suggested that these cues to lexical stress are differently perceptible in spectro-temporally degraded electric hearing, as CI users appear to make more effective use of changes in vowel quality than of changes in the F0 contour as cues to linguistic phenomena. In this study, native Dutch learners of English identified stressed syllables in CI-simulated and non-CI-simulated Dutch and English words that contained changes in the F0 contour and vowel quality as cues to lexical stress. The results indicate that neither the cue-weighting strategies in the native language nor in the non-native language are influenced by the perceptibility of cues in the spectro-temporally degraded speech signal. These results are in contrast to our expectations based on previous research and support the idea that cue weighting is a flexible and transferable process.
Collapse
|
3
|
Phrase parsing in a second language as indexed by the closure positive shift: The impact of language experience and acoustic cue salience. Eur J Neurosci 2023; 58:3838-3858. [PMID: 37667595 DOI: 10.1111/ejn.16134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/22/2023] [Accepted: 08/12/2023] [Indexed: 09/06/2023]
Abstract
Despite the importance of prosodic processing in utterance parsing, a majority of studies investigating boundary localization in a second language focus on word segmentation. The goal of the present study was to investigate the parsing of phrase boundaries in first and second languages from different prosodic typologies (stress-timed vs. syllable-timed). Fifty English-French bilingual adults who varied in native language (French or English) and second language proficiency listened to English and French utterances with different prosodic structures while event-related brain potentials were recorded. The utterances were built around target words presented either in phrase-final position (bearing phrase-final lengthening) or in penultimate position. Each participant listened to both English and French stimuli, providing data in their native language (used as reference) and their second language. Target words in phrase-final position elicited closure positive shifts across listeners in both languages, regardless of the language-specific acoustic cues associated with phrase-final lengthening (shorter phrase-final lengthening in English compared to French). Interestingly, directional effects were observed, where learning to parse English as a second language in a native-like manner seemed to require a higher proficiency level than learning to parse French as a second language. This pattern of results supports the idea that L2 listeners need to learn to recognize L2-specific phrase-final lengthening regardless of the apparent similarity across languages and that some language combinations might present greater challenges than others.
Collapse
|
4
|
Spoken Word Recognition across Language Boundary: ERP Evidence of Prosodic Transfer Driven by Pitch. Brain Sci 2023; 13:brainsci13020202. [PMID: 36831746 PMCID: PMC9953763 DOI: 10.3390/brainsci13020202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 01/20/2023] [Accepted: 01/21/2023] [Indexed: 01/27/2023] Open
Abstract
Extensive research has explored the perception of English lexical stress by Chinese EFL learners and tried to unveil the underlying mechanism of the prosodic transfer from a native tonal language to a non-native stress language. However, the role of the pitch as the shared cue by lexical stress and lexical tone during the transfer remains controversial when the segmental cue (i.e., reduced vowel) is absent. By employing event-related potential (ERP) measurements, the current study aimed to further investigate the role of the pitch during the prosodic transfer from L1 lexical tone to L2 lexical stress and the underlying neural responses. Two groups of adult Chinese EFL learners were compared, as both Mandarin and Cantonese are tonal languages with different levels of complexity. The results showed that Cantonese speakers relied more than Mandarin speakers on pitch cues, not only in their processing of English lexical stress but also in word recognition. Our findings are consistent with the arguments of Cue Weighting and attest to the influence of native tonal language experience on second language acquisition. The results may have implications on pedagogical methods that pitch could be an important clue in second language teaching.
Collapse
|
5
|
Relative Difficulty in the Acquisition of the Phonetic Parameters of Obstruent Coda Voicing: Evidence from Mandarin-Speaking Learners of French. LANGUAGE AND SPEECH 2022:238309221114143. [PMID: 36062625 PMCID: PMC10394971 DOI: 10.1177/00238309221114143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A recurring finding of research on the L2 acquisition of coda obstruent voicing is that, in terms of the phonetic parameters that serve to realize the voicing contrast, learners are overwhelmingly more accurate with duration than the voicing of the obstruent itself. The current work expands our understanding of this asymmetry in two ways. First, as previous studies have focused almost exclusively on learners of English, we investigate here whether L2 learners' superior production of duration is also found among learners of other target languages via a study of Mandarin-speaking learners' production of French stop and fricative codas. Results from 18 Mandarin-speaking learners of French, primarily of beginner and intermediate proficiency who completed a sentence reading task, parallel those of previous studies with greater accuracy observed for vowel duration than the laryngeal voicing of the obstruent. Second, we explore potential sources of this asymmetry, in particular, the roles of L1 experience as well as of universal factors, namely, the relative perceptual salience of duration versus voicing, and the articulatory difficulty of voicing obstruents.
Collapse
|
6
|
Going Beyond Rote Auditory Learning: Neural Patterns of Generalized Auditory Learning. J Cogn Neurosci 2022; 34:425-444. [PMID: 34942645 PMCID: PMC8832160 DOI: 10.1162/jocn_a_01805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The ability to generalize across specific experiences is vital for the recognition of new patterns, especially in speech perception considering acoustic-phonetic pattern variability. Indeed, behavioral research has demonstrated that listeners are able via a process of generalized learning to leverage their experiences of past words said by difficult-to-understand talker to improve their understanding for new words said by that talker. Here, we examine differences in neural responses to generalized versus rote learning in auditory cortical processing by training listeners to understand a novel synthetic talker. Using a pretest-posttest design with EEG, participants were trained using either (1) a large inventory of words where no words were repeated across the experiment (generalized learning) or (2) a small inventory of words where words were repeated (rote learning). Analysis of long-latency auditory evoked potentials at pretest and posttest revealed that rote and generalized learning both produced rapid changes in auditory processing, yet the nature of these changes differed. Generalized learning was marked by an amplitude reduction in the N1-P2 complex and by the presence of a late negativity wave in the auditory evoked potential following training; rote learning was marked only by temporally later scalp topography differences. The early N1-P2 change, found only for generalized learning, is consistent with an active processing account of speech perception, which proposes that the ability to rapidly adjust to the specific vocal characteristics of a new talker (for which rote learning is rare) relies on attentional mechanisms to selectively modify early auditory processing sensitivity.
Collapse
|
7
|
Abstract
Early changes in infants’ ability to perceive native and nonnative speech sound contrasts are typically attributed to their developing knowledge of phonetic categories. We critically examine this hypothesis and argue that there is little direct evidence of category knowledge in infancy. We then propose an alternative account in which infants’ perception changes because they are learning a perceptual space that is appropriate to represent speech, without yet carving up that space into phonetic categories. If correct, this new account has substantial implications for understanding early language development.
Collapse
|
8
|
Dimension-selective attention and dimensional salience modulate cortical tracking of acoustic dimensions. Neuroimage 2021; 244:118544. [PMID: 34492294 DOI: 10.1016/j.neuroimage.2021.118544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/19/2021] [Accepted: 08/31/2021] [Indexed: 11/17/2022] Open
Abstract
Some theories of auditory categorization suggest that auditory dimensions that are strongly diagnostic for particular categories - for instance voice onset time or fundamental frequency in the case of some spoken consonants - attract attention. However, prior cognitive neuroscience research on auditory selective attention has largely focused on attention to simple auditory objects or streams, and so little is known about the neural mechanisms that underpin dimension-selective attention, or how the relative salience of variations along these dimensions might modulate neural signatures of attention. Here we investigate whether dimensional salience and dimension-selective attention modulate the cortical tracking of acoustic dimensions. In two experiments, participants listened to tone sequences varying in pitch and spectral peak frequency; these two dimensions changed at different rates. Inter-trial phase coherence (ITPC) and amplitude of the EEG signal at the frequencies tagged to pitch and spectral changes provided a measure of cortical tracking of these dimensions. In Experiment 1, tone sequences varied in the size of the pitch intervals, while the size of spectral peak intervals remained constant. Cortical tracking of pitch changes was greater for sequences with larger compared to smaller pitch intervals, with no difference in cortical tracking of spectral peak changes. In Experiment 2, participants selectively attended to either pitch or spectral peak. Cortical tracking was stronger in response to the attended compared to unattended dimension for both pitch and spectral peak. These findings suggest that attention can enhance the cortical tracking of specific acoustic dimensions rather than simply enhancing tracking of the auditory object as a whole.
Collapse
|
9
|
Vocabulary Size Is a Key Factor in Predicting Second Language Lexical Encoding Accuracy. Front Psychol 2021; 12:688356. [PMID: 34367013 PMCID: PMC8339215 DOI: 10.3389/fpsyg.2021.688356] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/15/2021] [Indexed: 12/04/2022] Open
Abstract
This study investigates the relationship between the accuracy of second language lexical representations and perception, phonological short-term memory, inhibitory control, attention control, and second language vocabulary size. English-speaking learners of Spanish were tested on their lexical encoding of the Spanish /ɾ-r/, /ɾ-d/, /r-d/, and /f-p/ contrasts through a lexical decision task. Perception ability was measured with an oddity task, phonological short-term memory with a serial non-word recognition task, attention control with a flanker task, inhibitory control with a retrieval-induced inhibition task, and vocabulary size with the X_Lex vocabulary test. Results revealed that differences in perception performance, inhibitory control, and attention control were not related to differences in lexical encoding accuracy. Phonological short-term memory was a significant factor, but only for the /r-ɾ/ contrast. This suggests that when representations contain sounds that are differentiated along a dimension not used in the native language, learners with higher phonological short-term memory have an advantage because they are better able to hold the relevant phonetic details in memory long enough to be transferred to long-term representations. Second language vocabulary size predicted lexical encoding across three of the four contrasts, such that a larger vocabulary predicted greater accuracy. This is likely because the acquisition of more phonologically similar words forces learners’ phonological systems to create more detailed representations in order for such words to be differentiated. Overall, this study suggests that vocabulary size in the second language is the most important factor in the accuracy of lexical representations.
Collapse
|
10
|
Voice Emotion Recognition by Mandarin-Speaking Children with Cochlear Implants. Ear Hear 2021; 43:165-180. [PMID: 34288631 DOI: 10.1097/aud.0000000000001085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Objectives Emotional expressions are very important in social interactions. Children with cochlear implants can have voice emotion recognition deficits due to device limitations. Mandarin-speaking children with cochlear implants may face greater challenges than those speaking nontonal languages; the pitch information is not well preserved in cochlear implants, and such children could benefit from child-directed speech, which carries more exaggerated distinctive acoustic cues for different emotions. This study investigated voice emotion recognition, using both adult-directed and child-directed materials, in Mandarin-speaking children with cochlear implants compared with normal hearing peers. The authors hypothesized that both the children with cochlear implants and those with normal hearing would perform better with child-directed materials than with adult-directed materials. Design Thirty children (7.17-17 years of age) with cochlear implants and 27 children with normal hearing (6.92-17.08 years of age) were recruited in this study. Participants completed a nonverbal reasoning test, speech recognition tests, and a voice emotion recognition task. Children with cochlear implants over the age of 10 years also completed the Chinese version of the Nijmegen Cochlear Implant Questionnaire to evaluate the health-related quality of life. The voice emotion recognition task was a five-alternative, forced-choice paradigm, which contains sentences spoken with five emotions (happy, angry, sad, scared, and neutral) in a child-directed or adult-directed manner. Results Acoustic analyses showed substantial variations across emotions in all materials, mainly on measures of mean fundamental frequency and fundamental frequency range. Mandarin-speaking children with cochlear implants displayed a significantly poorer performance than normal hearing peers in voice emotion perception tasks, regardless of whether the performance is measured in accuracy scores, Hu value, or reaction time. Children with cochlear implants and children with normal hearing were mainly affected by the mean fundamental frequency in speech emotion recognition tasks. Chronological age had a significant effect on speech emotion recognition in children with normal hearing; however, there was no significant correlation between chronological age and accuracy scores in speech emotion recognition in children with implants. Significant effects of specific emotion and test materials (better performance with child-directed materials) in both groups of children were observed. Among the children with cochlear implants, age at implantation, percentage scores of nonverbal intelligence quotient test, and sentence recognition threshold in quiet could predict recognition performance in both accuracy scores and Hu values. Time wearing cochlear implant could predict reaction time in emotion perception tasks among children with cochlear implants. No correlation was observed between the accuracy score in voice emotion perception and the self-reported scores of health-related quality of life; however, the latter were significantly correlated with speech recognition skills among Mandarin-speaking children with cochlear implants. Conclusions Mandarin-speaking children with cochlear implants could have significant deficits in voice emotion recognition tasks compared with their normally hearing peers and can benefit from the exaggerated prosody of child-directed speech. The effects of age at cochlear implantation, speech and language development, and cognition could play an important role in voice emotion perception by Mandarin-speaking children with cochlear implants.
Collapse
|
11
|
Contrasting Similar Words Facilitates Second Language Vocabulary Learning in Children by Sharpening Lexical Representations. Front Psychol 2021; 12:688160. [PMID: 34295290 PMCID: PMC8290082 DOI: 10.3389/fpsyg.2021.688160] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/08/2021] [Indexed: 11/13/2022] Open
Abstract
This study considers one of the cognitive mechanisms underlying the development of second language (L2) vocabulary in children: The differentiation and sharpening of lexical representations. We propose that sharpening is triggered by an implicit comparison of similar representations, a process we call contrasting. We investigate whether integrating contrasting in a learning method in which children contrast orthographically and semantically similar L2 words facilitates learning of those words by sharpening their new lexical representations. In our study, 48 Dutch-speaking children learned unfamiliar orthographically and semantically similar English words in a multiple-choice learning task. One half of the group learned the similar words by contrasting them, while the other half did not contrast them. Their word knowledge was measured immediately after learning as well as 1 week later. Contrasting was found to facilitate learning by leading to more precise lexical representations. However, only highly skilled readers benefitted from contrasting. Our findings offer novel insights into the development of L2 lexical representations from fuzzy to more precise, and have potential implications for education.
Collapse
|
12
|
Dutch listeners' perception of English lexical stress: A cue-weighting approach. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3703. [PMID: 34241448 DOI: 10.1121/10.0005086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 05/08/2021] [Indexed: 06/13/2023]
Abstract
We investigate whether acoustic cue weightings are transferred from the native language to the second language [research question 1 (RQ1)], how cue weightings change with increasing second-language proficiency (RQ2), and whether individual cues are used independently or together in the second language (RQ3). Vowel reduction is a strong cue to lexical stress in English but not Dutch. Native English listeners and Dutch second-language learners of English completed a cue-weighting stress perception experiment. Participants heard sentence-final pitch-accented auditory stimuli and identified them as DEsert (initial stress) or deSSERT (final stress). The stimuli were manipulated in seven steps from initial to final stress, manipulating two dimensions at a time: vowel quality and pitch, vowel quality and duration, and pitch and duration (other dimensions neutralized). Dutch listeners relied less on vowel quality and more on pitch than English listeners, with Dutch listeners' sensitivity to vowel quality increasing with English proficiency but their sensitivity to pitch not varying with proficiency; Dutch listeners evidenced similar or weaker reliance on duration than did English listeners, and their sensitivity to duration increased with proficiency; and Dutch listeners' use of pitch and duration were positively related. These results provide general support for a cue-based transfer approach to the perception of lexical stress.
Collapse
|
13
|
Abstract
OBJECTIVE Acoustic distortions to the speech signal impair spoken language recognition, but healthy listeners exhibit adaptive plasticity consistent with rapid adjustments in how the distorted speech input maps to speech representations, perhaps through engagement of supervised error-driven learning. This puts adaptive plasticity in speech perception in an interesting position with regard to developmental dyslexia inasmuch as dyslexia impacts speech processing and may involve dysfunction in neurobiological systems hypothesized to be involved in adaptive plasticity. METHOD Here, we examined typical young adult listeners (N = 17), and those with dyslexia (N = 16), as they reported the identity of native-language monosyllabic spoken words to which signal processing had been applied to create a systematic acoustic distortion. During training, all participants experienced incremental signal distortion increases to mildly distorted speech along with orthographic and auditory feedback indicating word identity following response across a brief, 250-trial training block. During pretest and posttest phases, no feedback was provided to participants. RESULTS Word recognition across severely distorted speech was poor at pretest and equivalent across groups. Training led to improved word recognition for the most severely distorted speech at posttest, with evidence that adaptive plasticity generalized to support recognition of new tokens not previously experienced under distortion. However, training-related recognition gains for listeners with dyslexia were significantly less robust than for control listeners. CONCLUSIONS Less efficient adaptive plasticity to speech distortions may impact the ability of individuals with dyslexia to deal with variability arising from sources like acoustic noise and foreign-accented speech.
Collapse
|
14
|
Abstract
Listeners exposed to accented speech must adjust how they map between acoustic features and lexical representations such as phonetic categories. A robust form of this adaptive perceptual learning is learning to perceive synthetic speech where the connections between acoustic features and phonetic categories must be updated. Both implicit learning through mere exposure and explicit learning through directed feedback have previously been shown to produce this type of adaptive learning. The present study crosses implicit exposure and explicit feedback with the presence or absence of a written identification task. We show that simple exposure produces some learning, but explicit feedback produces substantially stronger learning, whereas requiring written identification did not measurably affect learning. These results suggest that explicit feedback guides learning of new mappings between acoustic patterns and known phonetic categories. We discuss mechanisms that may support learning via implicit exposure.
Collapse
|
15
|
Influence of different acoustic cues in L1 lexical tone on the perception of L2 lexical stress using principal component analysis: an ERP study. Exp Brain Res 2020; 238:1489-1498. [PMID: 32435921 DOI: 10.1007/s00221-020-05823-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 04/27/2020] [Indexed: 11/26/2022]
Abstract
Previous studies have widely explored the prosodic transfer from L1 to L2 during speech perception across stress languages. However, few if any studies have investigated the transfer from L1 tonal language to L2 stress language and the relative roles of different acoustic cues underlying the transfer. Therefore, the current study was conducted to compare the perception of English lexical stress between Mandarin and Cantonese speakers who learn English as a foreign language. The event-related potential measurements and the principal component analysis were conducted for the two groups to explore the roles of different acoustic cues in the perception of English speech. The results demonstrated that compared with the Mandarin group, the Cantonese speakers relied more on pitch information and the reliance holds even when all the three cues varied simultaneously. Therefore, it was concluded that prosodic transfer from L1 lexical tone to L2 lexical stress occurred at the acoustic level, and the native linguistic background shaped the manner how speakers perceived the L2 speech.
Collapse
|
16
|
Tailored perception: Individuals' speech and music perception strategies fit their perceptual abilities. J Exp Psychol Gen 2020; 149:914-934. [PMID: 31589067 PMCID: PMC7133494 DOI: 10.1037/xge0000688] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/12/2019] [Indexed: 01/09/2023]
Abstract
Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, for example, pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual strategies), but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N = 11) categorize linguistic stimuli similarly to controls (N = 11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues-even when pitch differences in the stimuli are large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N = 16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g., vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
|
17
|
Sleep Promotes Phonological Learning in Children Across Language and Autism Spectra. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:4235-4255. [PMID: 31770054 DOI: 10.1044/2019_jslhr-s-19-0098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose Establishing stable and flexible phonological representations is a key component of language development and one which is thought to vary across children with neurodevelopmental disorders affecting language acquisition. Sleep is understood to support the learning and generalization of new phonological mappings in adults, but this remains to be examined in children. This study therefore explored the time course of phonological learning in childhood and how it varies by structural language and autism symptomatology. Method Seventy-seven 7- to 13-year-old children, 30 with high autism symptomatology, were included in the study; structural language ability varied across the sample. Children learned new phonological mappings based on synthesized speech tokens in the morning; performance was then charted via repetition (without feedback) over 24 hr and followed up 4 weeks later. On the night following learning, children's sleep was monitored with polysomnography. Results A period of sleep but not wake was associated with improvement on the phonological learning task in childhood. Sleep was associated with improved performance for both trained items and novel items. Structural language ability predicted overall task performance, though language ability did not predict degree of change from one session to the next. By contrast, autism symptomatology did not explain task performance. With respect to sleep architecture, rapid eye movement features were associated with greater phonological generalization. Conclusions Children's sleep was associated with improvement in performance on both trained and novel items. Phonological generalization was associated with brain activity during rapid eye movement sleep. This study furthers our understanding of individual differences in the acquisition of new phonological mappings and the role of sleep in this process over childhood. Supplemental Material https://doi.org/10.23641/asha.11126732.
Collapse
|
18
|
Abstract
Human category learning appears to be supported by dual learning systems. Previous research indicates the engagement of distinct neural systems in learning categories that require selective attention to dimensions versus those that require integration across dimensions. This evidence has largely come from studies of learning across perceptually separable visual dimensions, but recent research has applied dual system models to understanding auditory and speech categorization. Since differential engagement of the dual learning systems is closely related to selective attention to input dimensions, it may be important that acoustic dimensions are quite often perceptually integral and difficult to attend to selectively. We investigated this issue across artificial auditory categories defined by center frequency and modulation frequency acoustic dimensions. Learners demonstrated a bias to integrate across the dimensions, rather than to selectively attend, and the bias specifically reflected a positive correlation between the dimensions. Further, we found that the acoustic dimensions did not equivalently contribute to categorization decisions. These results demonstrate the need to reconsider the assumption that the orthogonal input dimensions used in designing an experiment are indeed orthogonal in perceptual space as there are important implications for category learning.
Collapse
|
19
|
Abstract
This study investigated the role of different cognitive abilities-inhibitory control, attention control, phonological short-term memory (PSTM), and acoustic short-term memory (AM)-in second language (L2) vowel learning. The participants were 40 Azerbaijani learners of Standard Southern British English. Their perception of L2 vowels was tested through a perceptual discrimination task before and after five sessions of high-variability phonetic training. Inhibitory control was significantly correlated with gains from training in the discrimination of L2 vowel pairs. However, there were no significant correlations between attention control, AM, PSTM, and gains from training. These findings suggest the potential role of inhibitory control in L2 phonological learning. We suggest that inhibitory control facilitates the processing of L2 sounds by allowing learners to ignore the interfering information from L1 during training, leading to better L2 segmental learning.
Collapse
|
20
|
Specificity and generalization in perceptual adaptation to accented speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:3382. [PMID: 31255164 PMCID: PMC6557708 DOI: 10.1121/1.5110302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Revised: 05/03/2019] [Accepted: 05/10/2019] [Indexed: 05/28/2023]
Abstract
The present study investigated the degree to which perceptual adaptation to foreign-accented speech is specific to the regularities in pronunciation associated with a particular accent. Across experiments, the conditions under which generalization of learning did or did not occur were evaluated. In Experiment 1, listeners trained on word-length utterances in Korean-accented English and tested with words produced by the same or different set of Korean-accented speakers. Listeners performed better than untrained controls when tested with novel words from the same or different speakers. In Experiment 2, listeners were trained with Spanish-, Korean-, or mixed-accented speech and transcribed novel words produced by unfamiliar Korean- or Spanish-accented speakers at test. The findings revealed relative specificity of learning. Listeners trained and tested on the same variety of accented speech showed better transcription at test than those trained with a different accent or untrained controls. Performance after mixed-accent training was intermediate. Patterns of errors and analysis of acoustic properties for accented vowels suggested perceptual improvement for regularities arising from each accent, with learning dependent on the relative similarity of linguistic form within and across accents.
Collapse
|
21
|
The Role of Temporal Acoustic Exaggeration in High Variability Phonetic Training: A Behavioral and ERP Study. Front Psychol 2019; 10:1178. [PMID: 31178795 PMCID: PMC6543854 DOI: 10.3389/fpsyg.2019.01178] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 05/06/2019] [Indexed: 12/03/2022] Open
Abstract
High variability phonetic training (HVPT) has been found to be effective in helping adult learners acquire non-native phonetic contrasts. The present study investigated the role of temporal acoustic exaggeration by comparing the canonical HVPT paradigm without involving acoustic exaggeration with a modified adaptive HVPT paradigm that integrated key temporal exaggerations in infant-directed speech (IDS). Sixty native Chinese adults participated in the training of the English /i/ and /i/ vowel contrast and were randomly assigned to three subject groups. Twenty were trained with the typical HVPT paradigm (the HVPT group), twenty were trained under the modified adaptive approach with acoustic exaggeration (the HVPT-E group), and twenty were in the control group. Behavioral tasks for the pre- and post- tests used natural word identification, synthetic stimuli identification, and synthetic stimuli discrimination. Mismatch negativity (MMN) responses from the HVPT-E group were also obtained to assess the training effects in within- and across- category discrimination without requiring focused attention. Like previous studies, significant generalization effects to new talkers were found in both the HVPT group and the HVPT-E group. The HVPT-E group, by contrast, showed greater improvement as reflected in larger progress in natural word identification performance. Furthermore, the HVPT-E group exhibited more native-like categorical perception based on spectral cues after training, together with corresponding training-induced changes in the MMN responses to within- and across- category differences. These data provide the initial evidence supporting the important role of temporal acoustic exaggeration with adaptive training in facilitating phonetic learning and promoting brain plasticity at the perceptual and pre-attentive neural levels.
Collapse
|
22
|
Learning mechanisms in cue reweighting. Cognition 2019; 189:76-88. [PMID: 30928780 DOI: 10.1016/j.cognition.2019.03.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 03/16/2019] [Accepted: 03/20/2019] [Indexed: 10/27/2022]
Abstract
Feedback has been shown to be effective in shifting attention across perceptual cues to a phonological contrast in speech perception (Francis, Baldwin & Nusbaum, 2000). However, the learning mechanisms behind this process remain obscure. We compare the predictions of supervised error-driven learning (Rescorla & Wagner, 1972) and reinforcement learning (Sutton & Barto, 1998) using computational simulations. Supervised learning predicts downweighting of an informative cue when the learner receives evidence that it is no longer informative. In contrast, reinforcement learning suggests that a reduction in cue weight requires positive evidence for the informativeness of an alternative cue. Experimental evidence supports the latter prediction, implicating reinforcement learning as the mechanism behind the effect of feedback on cue weighting in speech perception. Native English listeners were exposed to either bimodal or unimodal VOT distributions spanning the unaspirated/aspirated boundary (bear/pear). VOT is the primary cue to initial stop voicing in English. However, lexical feedback in training indicated that VOT was no longer predictive of voicing. Reduction in the weight of VOT was observed only when participants could use an alternative cue, F0, to predict voicing. Frequency distributions had no effect on learning. Overall, the results suggest that attention shifting in learning the phonetic cues to phonological categories is accomplished using simple reinforcement learning principles that also guide the choice of actions in other domains.
Collapse
|
23
|
Differences in cue weights for speech perception are correlated for individuals within and across contrasts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:EL172. [PMID: 30424660 DOI: 10.1121/1.5052025] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Accepted: 08/10/2018] [Indexed: 06/09/2023]
Abstract
Speech perception requires multiple acoustic cues. Cue weighting may differ across individuals but be systematic within individuals. The current study compared individuals' cue weights within and across contrasts. Forty-two listeners performed a two-alternative forced choice task for four out of five sets of minimal pairs, each varying orthogonally in two dimensions. Individuals' cue weights within a contrast were positively correlated for bet-bat, Luce-lose, and sock-shock, but not for bog-dog and dear-tear. Importantly, individuals' cue weights were also positively correlated across contrasts. This indicates that some individuals are better able to extract and use phonetic information across different dimensions.
Collapse
|
24
|
Maintaining information about speech input during accent adaptation. PLoS One 2018; 13:e0199358. [PMID: 30086140 PMCID: PMC6080756 DOI: 10.1371/journal.pone.0199358] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 06/06/2018] [Indexed: 11/19/2022] Open
Abstract
Speech understanding can be thought of as inferring progressively more abstract representations from a rapidly unfolding signal. One common view of this process holds that lower-level information is discarded as soon as higher-level units have been inferred. However, there is evidence that subcategorical information about speech percepts is not immediately discarded, but is maintained past word boundaries and integrated with subsequent input. Previous evidence for such subcategorical information maintenance has come from paradigms that lack many of the demands typical to everyday language use. We ask whether information maintenance is also possible under more typical constraints, and in particular whether it can facilitate accent adaptation. In a web-based paradigm, participants listened to isolated foreign-accented words in one of three conditions: subtitles were displayed concurrently with the speech, after speech offset, or not displayed at all. The delays between speech offset and subtitle presentation were manipulated. In a subsequent test phase, participants then transcribed novel words in the same accent without the aid of subtitles. We find that subtitles facilitate accent adaptation, even when displayed with a 6 second delay. Listeners thus maintained subcategorical information for sufficiently long to allow it to benefit adaptation. We close by discussing what type of information listeners maintain-subcategorical phonetic information, or just uncertainty about speech categories.
Collapse
|
25
|
Don't speak too fast! Processing of fast rate speech in children with specific language impairment. PLoS One 2018; 13:e0191808. [PMID: 29373610 PMCID: PMC5786310 DOI: 10.1371/journal.pone.0191808] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Accepted: 01/11/2018] [Indexed: 11/23/2022] Open
Abstract
Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.
Collapse
|
26
|
Learning a Talker or Learning an Accent: Acoustic Similarity Constrains Generalization of Foreign Accent Adaptation to New Talkers. JOURNAL OF MEMORY AND LANGUAGE 2017; 97:30-46. [PMID: 28890602 PMCID: PMC5589144 DOI: 10.1016/j.jml.2017.07.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Past research has revealed that native listeners use top-down information to adjust the mapping from speech sounds to phonetic categories. Such phonetic adjustments help listeners adapt to foreign-accented speech. However, the mechanism by which talker-specific adaptation generalizes to other talkers is poorly understood. Here we asked what conditions induce crosstalker generalization in talker accent adaptation. Native-English listeners were exposed to Mandarin-accented words, produced by a single talker or multiple talkers. Following exposure, adaptation to the accent was tested by recognition of novel words in a task that assesses online lexical access. Crucially, test words were novel words and were produced by a novel Mandarin-accented talker. Results indicated that regardless of exposure condition (single or multiple talker exposure), generalization was greatest when the talkers were acoustically similar to one another, suggesting that listeners were not developing an accent-wide schema for Mandarin talkers, but rather attuning to the specific acoustic-phonetic properties of the talkers. Implications for general mechanisms of talker generalization in speech adaptation are discussed.
Collapse
|
27
|
Phonological experience modulates voice discrimination: Evidence from functional brain networks analysis. BRAIN AND LANGUAGE 2017; 173:67-75. [PMID: 28662482 DOI: 10.1016/j.bandl.2017.06.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 04/19/2017] [Accepted: 06/03/2017] [Indexed: 06/07/2023]
Abstract
Numerous behavioral studies have found a modulation effect of phonological experience on voice discrimination. However, the neural substrates underpinning this phenomenon are poorly understood. Here we manipulated language familiarity to test the hypothesis that phonological experience affects voice discrimination via mediating the engagement of multiple perceptual and cognitive resources. The results showed that during voice discrimination, the activation of several prefrontal regions was modulated by language familiarity. More importantly, the same effect was observed concerning the functional connectivity from the fronto-parietal network to the voice-identity network (VIN), and from the default mode network to the VIN. Our findings indicate that phonological experience could bias the recruitment of cognitive control and information retrieval/comparison processes during voice discrimination. Therefore, the study unravels the neural substrates subserving the modulation effect of phonological experience on voice discrimination, and provides new insights into studying voice discrimination from the perspective of network interactions.
Collapse
|
28
|
More than a boundary shift: Perceptual adaptation to foreign-accented speech reshapes the internal structure of phonetic categories. J Exp Psychol Hum Percept Perform 2016; 43:206-217. [PMID: 27819457 DOI: 10.1037/xhp0000285] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The literature on perceptual learning for speech shows that listeners use lexical information to disambiguate phonetically ambiguous speech sounds and that they maintain this new mapping for later recognition of ambiguous sounds for a given talker. Evidence for this kind of perceptual reorganization has focused on phonetic category boundary shifts. Here, we asked whether listeners adjust both category boundaries and internal category structure in rapid adaptation to foreign accents. We investigated the perceptual learning of Mandarin-accented productions of word-final voiced stops in English. After exposure to a Mandarin speaker's productions, native-English listeners' adaptation to the talker was tested in 3 ways: a cross-modal priming task to assess spoken word recognition (Experiment 1), a category identification task to assess shifts in the phonetic boundary (Experiment 2), and a goodness rating task to assess internal category structure (Experiment 3). Following exposure, both category boundary and internal category structure were adjusted; moreover, these prelexical changes facilitated subsequent word recognition. Together, the results demonstrate that listeners' sensitivity to acoustic-phonetic detail in the accented input promoted a dynamic, comprehensive reorganization of their perceptual response as a consequence of exposure to the accented input. We suggest that an examination of internal category structure is important for a complete account of the mechanisms of perceptual learning. (PsycINFO Database Record
Collapse
|
29
|
Dimension-Based Statistical Learning Affects Both Speech Perception and Production. Cogn Sci 2016; 41 Suppl 4:885-912. [PMID: 27666146 DOI: 10.1111/cogs.12413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 04/04/2016] [Accepted: 04/29/2016] [Indexed: 11/29/2022]
Abstract
Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more "perceptual weight" and more effectively signal category membership to native listeners. Yet perceptual weights are malleable. When short-term experience deviates from long-term language norms, such as in a foreign accent, the perceptual weight of acoustic dimensions in signaling speech category membership rapidly adjusts. The present study investigated whether rapid adjustments in listeners' perceptual weights in response to speech that deviates from the norms also affects listeners' own speech productions. In a word recognition task, the correlation between two acoustic dimensions signaling consonant categories, fundamental frequency (F0) and voice onset time (VOT), matched the correlation typical of English, and then shifted to an "artificial accent" that reversed the relationship, and then shifted back. Brief, incidental exposure to the artificial accent caused participants to down-weight perceptual reliance on F0, consistent with previous research. Throughout the task, participants were intermittently prompted with pictures to produce these same words. In the block in which listeners heard the artificial accent with a reversed F0 × VOT correlation, F0 was a less robust cue to voicing in listeners' own speech productions. The statistical regularities of short-term speech input affect both speech perception and production, as evidenced via shifts in how acoustic dimensions are weighted.
Collapse
|
30
|
Adaptive plasticity in speech perception: Effects of external information and internal predictions. J Exp Psychol Hum Percept Perform 2016; 42:1048-59. [PMID: 26854531 DOI: 10.1037/xhp0000196] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When listeners encounter speech under adverse listening conditions, adaptive adjustments in perception can improve comprehension over time. In some cases, these adaptive changes require the presence of external information that disambiguates the distorted speech signals, whereas in other cases mere exposure is sufficient. Both external (e.g., written feedback) and internal (e.g., prior word knowledge) sources of information can be used to generate predictions about the correct mapping of a distorted speech signal. We hypothesize that these predictions provide a basis for determining the discrepancy between the expected and actual speech signal that can be used to guide adaptive changes in perception. This study provides the first empirical investigation that manipulates external and internal factors through (a) the availability of explicit external disambiguating information via the presence or absence of postresponse orthographic information paired with a repetition of the degraded stimulus, and (b) the accuracy of internally generated predictions; an acoustic distortion is introduced either abruptly or incrementally. The results demonstrate that the impact of external information on adaptive plasticity is contingent upon whether the intelligibility of the stimuli permits accurate internally generated predictions during exposure. External information sources enhance adaptive plasticity only when input signals are severely degraded and cannot reliably access internal predictions. This is consistent with a computational framework for adaptive plasticity in which error-driven supervised learning relies on the ability to compute sensory prediction error signals from both internal and external sources of information. (PsycINFO Database Record
Collapse
|
31
|
Normal-Hearing Listeners' and Cochlear Implant Users' Perception of Pitch Cues in Emotional Speech. Iperception 2015; 6:0301006615599139. [PMID: 27648210 PMCID: PMC5016815 DOI: 10.1177/0301006615599139] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study's aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings' pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners' and 20 CI users' emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor's emotions worse than the other actors'. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker's deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues.
Collapse
|
32
|
Abstract
Human speech perception rapidly adapts to maintain comprehension under adverse listening conditions. For example, with exposure listeners can adapt to heavily accented speech produced by a non-native speaker. Outside the domain of speech perception, adaptive changes in sensory and motor processing have been attributed to cerebellar functions. The present functional magnetic resonance imaging study investigates whether adaptation in speech perception also involves the cerebellum. Acoustic stimuli were distorted using a vocoding plus spectral-shift manipulation and presented in a word recognition task. Regions in the cerebellum that showed differences before versus after adaptation were identified, and the relationship between activity during adaptation and subsequent behavioral improvements was examined. These analyses implicated the right Crus I region of the cerebellum in adaptive changes in speech perception. A functional correlation analysis with the right Crus I as a seed region probed for cerebral cortical regions with covarying hemodynamic responses during the adaptation period. The results provided evidence of a functional network between the cerebellum and language-related regions in the temporal and parietal lobes of the cerebral cortex. Consistent with known cerebellar contributions to sensorimotor adaptation, cerebro-cerebellar interactions may support supervised learning mechanisms that rely on sensory prediction error signals in speech perception.
Collapse
|
33
|
Individual differences in L2 acquisition of English phonology: The relation between cognitive abilities and phonological processing. LEARNING AND INDIVIDUAL DIFFERENCES 2015. [DOI: 10.1016/j.lindif.2015.04.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Individual sensitivity to spectral and temporal cues in listeners with hearing impairment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:520-34. [PMID: 25629388 PMCID: PMC4462137 DOI: 10.1044/2015_jslhr-h-14-0138] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 10/14/2014] [Accepted: 12/18/2014] [Indexed: 05/26/2023]
Abstract
PURPOSE The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. METHOD Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal (amplitude envelope). Spectral and/or temporal dimensions of synthetic speech were systematically manipulated along a continuum, and recognition was measured using the manipulated stimuli. Level was controlled to ensure cue audibility. Discriminant function analysis was used to determine to what degree spectral and temporal information contributed to the identification of each stimulus. RESULTS Listeners with normal hearing were influenced to a greater extent by spectral cues for all stimuli. Listeners with hearing impairment generally utilized spectral cues when the information was static (spectral shape) but used temporal cues when the information was dynamic (formant transition). The relative use of spectral and temporal dimensions varied among individuals, especially among listeners with hearing loss. CONCLUSION Information about spectral and temporal cue use may aid in identifying listeners who rely to a greater extent on particular acoustic cues and applying that information toward therapeutic interventions.
Collapse
|
35
|
Building phonetic categories: an argument for the role of sleep. Front Psychol 2014; 5:1192. [PMID: 25477828 PMCID: PMC4234907 DOI: 10.3389/fpsyg.2014.01192] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 10/02/2014] [Indexed: 12/30/2022] Open
Abstract
The current review provides specific predictions for the role of sleep-mediated memory consolidation in the formation of new speech sound representations. Specifically, this discussion will highlight selected literature on the different ideas concerning category representation in speech, followed by a broad overview of memory consolidation and how it relates to human behavior, as relevant to speech/perceptual learning. In combining behavioral and physiological accounts from animal models with insights from the human consolidation literature on auditory skill/word learning, we are in the early stages of understanding how the transfer of experiential information between brain structures during sleep manifests in changes to online perception. Arriving at the conclusion that this process is crucial in perceptual learning and the formation of novel categories, further speculation yields the adjacent claim that the habitual disruption in this process leads to impoverished quality in the representation of speech sounds.
Collapse
|
36
|
Abstract
The end-result of perceptual reorganization in infancy is currently viewed as a reconfigured perceptual space, "warped" around native-language phonetic categories, which then acts as a direct perceptual filter on any non-native sounds: naïve-listener discrimination of non-native-sounds is determined by their mapping onto native-language phonetic categories that are acoustically/articulatorily most similar. We report results that suggest another factor in non-native speech perception: some perceptual sensitivities cannot be attributed to listeners' warped perceptual space alone, but rather to enhanced general sensitivity along phonetic dimensions that the listeners' native language employs to distinguish between categories. Specifically, we show that the knowledge of a language with short and long vowel categories leads to enhanced discrimination of non-native consonant length contrasts. We argue that these results support a view of perceptual reorganization as the consequence of learners' hierarchical inductive inferences about the structure of the language's sound system: infants not only acquire the specific phonetic category inventory, but also draw higher-order generalizations over the set of those categories, such as the overall informativity of phonetic dimensions for sound categorization. Non-native sound perception is then also determined by sensitivities that emerge from these generalizations, rather than only by mappings of non-native sounds onto native-language phonetic categories.
Collapse
|
37
|
Cue-weighting in the perception of intervocalic stop voicing in European Portuguese. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:1334. [PMID: 25190406 DOI: 10.1121/1.4890639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper describes the perception of intervocalic stop voicing in European Portuguese (EP) stimuli without a stop burst, when varying three acoustic cues: Vowel duration, stop duration, and voicing maintenance during stop closure. Perceptual stimuli were generated using biomechanical modeling. First, a discrimination experiment was conducted to determine the listeners' perceptual sensitivity to the voicing maintenance cue. Second, an identification experiment was conducted to examine the effect and interaction of vowel duration, stop duration, and voicing maintenance during stop closure on the voiced/voiceless identification responses of EP listeners. The results of the discrimination test show that voicing maintenance differences have a significant effect as soon as they exceed a certain threshold. In the identification experiment, evidence was found that only the two factors vowel duration and voicing maintenance significantly influence the listeners' decisions, but not stop duration. The ratio between stop duration and vowel duration plays a major role in distinguishing stop voicing, but only for highly devoiced stimuli. It is shown that in stimuli without a stop burst, both voicing maintenance, as a major but not required cue, and vowel duration are important acoustic cues for stop voicing distinctions in EP.
Collapse
|
38
|
How may the basal ganglia contribute to auditory categorization and speech perception? Front Neurosci 2014; 8:230. [PMID: 25136291 PMCID: PMC4117994 DOI: 10.3389/fnins.2014.00230] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/13/2014] [Indexed: 02/01/2023] Open
Abstract
Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Collapse
|
39
|
Abstract
One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy.
Collapse
|
40
|
Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research. Front Syst Neurosci 2014; 7:126. [PMID: 24427119 PMCID: PMC3879477 DOI: 10.3389/fnsys.2013.00126] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 12/16/2013] [Indexed: 01/06/2023] Open
Abstract
Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. To this end, we review behavioral studies, computational accounts, and neuroimaging findings related to adaptive plasticity in speech perception. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Furthermore, we consider research topics in neuroscience that offer insight into how perception can be adaptively tuned to short-term deviations while balancing the need to maintain stability in the perception of learned long-term regularities. Consideration of the application and limitations of these algorithms in characterizing flexible speech perception under adverse conditions promises to inform theoretical models of speech.
Collapse
|
41
|
Specificity of dimension-based statistical learning in word recognition. J Exp Psychol Hum Percept Perform 2013; 40:1009-21. [PMID: 24364708 DOI: 10.1037/a0035269] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech perception flexibly adapts to short-term regularities of ambient speech input. Recent research demonstrates that the function of an acoustic dimension for speech categorization at a given time is relative to its relationship to the evolving distribution of dimensional regularity across time, and not simply to a fixed value along the dimension. Two experiments examine the nature of this dimension-based statistical learning in online word recognition, testing generalization of learning across phonetic categories. While engaged in a word recognition task guided by perceptually unambiguous voice-onset time (VOT) acoustics signaling stop voicing in either bilabial rhymes, beer and pier, or alveolar rhymes, deer and tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) with VOT (Experiment 1). Exposure to the change in the correlation of F0 with VOT led listeners to down-weight reliance on F0 in voicing categorization, indicating dimension-based statistical learning. This learning was observed only for the "accented" contrast varying in its F0/VOT relationship during exposure; learning did not generalize to the other place of articulation. Another group of listeners experienced competing F0/VOT correlations across place of articulation such that the global correlation for voicing was stable, but locally correlations across voicing pairs were opposing (e.g., "accented" beer and pier, "canonical" deer and tear, Experiment 2). Listeners showed dimension-based learning only for the accented pair, not the canonical pair, indicating that they are able to track separate acoustic statistics across place of articulation, that is, for /b-p/ and /d-t/. This suggests that dimension-based learning does not operate obligatorily at the phonological level of stop voicing.
Collapse
|
42
|
Thalamic and parietal brain morphology predicts auditory category learning. Neuropsychologia 2013; 53:75-83. [PMID: 24035788 DOI: 10.1016/j.neuropsychologia.2013.09.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 09/02/2013] [Accepted: 09/04/2013] [Indexed: 01/13/2023]
Abstract
Auditory categorization is a vital skill involving the attribution of meaning to acoustic events, engaging domain-specific (i.e., auditory) as well as domain-general (e.g., executive) brain networks. A listener's ability to categorize novel acoustic stimuli should therefore depend on both, with the domain-general network being particularly relevant for adaptively changing listening strategies and directing attention to relevant acoustic cues. Here we assessed adaptive listening behavior, using complex acoustic stimuli with an initially salient (but later degraded) spectral cue and a secondary, duration cue that remained nondegraded. We employed voxel-based morphometry (VBM) to identify cortical and subcortical brain structures whose individual neuroanatomy predicted task performance and the ability to optimally switch to making use of temporal cues after spectral degradation. Behavioral listening strategies were assessed by logistic regression and revealed mainly strategy switches in the expected direction, with considerable individual differences. Gray-matter probability in the left inferior parietal lobule (BA 40) and left precentral gyrus was predictive of "optimal" strategy switch, while gray-matter probability in thalamic areas, comprising the medial geniculate body, co-varied with overall performance. Taken together, our findings suggest that successful auditory categorization relies on domain-specific neural circuits in the ascending auditory pathway, while adaptive listening behavior depends more on brain structure in parietal cortex, enabling the (re)direction of attention to salient stimulus properties.
Collapse
|
43
|
Recognizing speech in a novel accent: the motor theory of speech perception reframed. BIOLOGICAL CYBERNETICS 2013; 107:421-447. [PMID: 23754133 DOI: 10.1007/s00422-013-0557-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2011] [Accepted: 04/10/2013] [Indexed: 06/02/2023]
Abstract
The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.
Collapse
|
44
|
Abstract
The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated using perceptual learning of interrupted speech. If different cognitive processes played a role in restoring interrupted speech with and without filler noise, the two forms of speech would be learned at different rates and with different perceived mental effort. If the restoration benefit were an artificial outcome of using the ecologically invalid stimulus of speech with silent gaps, this benefit would diminish with training. Two groups of normal-hearing listeners were trained, one with interrupted sentences with the filler noise, and the other without. Feedback was provided with the auditory playback of the unprocessed and processed sentences, as well as the visual display of the sentence text. Training increased the overall performance significantly, however restoration benefit did not diminish. The increase in intelligibility and the decrease in perceived mental effort were relatively similar between the groups, implying similar cognitive mechanisms for the restoration of the two types of interruptions. Training effects were generalizable, as both groups improved their performance also with the other form of speech than that they were trained with, and retainable. Due to null results and relatively small number of participants (10 per group), further research is needed to more confidently draw conclusions. Nevertheless, training with interrupted speech seems to be effective, stimulating participants to more actively and efficiently use the top-down restoration. This finding further implies the potential of this training approach as a rehabilitative tool for hearing-impaired/elderly populations.
Collapse
|
45
|
Individual differences in cue weights are stable across time: the case of Japanese stop lengths. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:3950-3964. [PMID: 23231125 PMCID: PMC3528741 DOI: 10.1121/1.4765076] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Revised: 10/09/2012] [Accepted: 10/16/2012] [Indexed: 05/30/2023]
Abstract
Speech categories are defined by multiple acoustic dimensions, and listeners give differential weighting to dimensions in phonetic categorization. The informativeness (predictive strength) of dimensions for categorization is considered an important factor in determining perceptual weighting. However, it is unknown how the perceptual system weighs acoustic dimensions with similar informativeness. This study investigates perceptual weighting of two acoustic dimensions with similar informativeness, exploiting the absolute and relative durations that are nearly equivalent in signaling Japanese singleton and geminate stop categories. In the perception experiments, listeners showed strong individual differences in their perceptual weighting of absolute and relative durations. Furthermore, these individual patterns were stable over repeated testing across as long as 2 months and were resistant to perturbation through short-term manipulation of speech input. Listeners own speech productions were not predictive of how they weighted relative and absolute duration. Despite the theoretical advantage of relative (as opposed to absolute) duration cues across contexts, relative cues are not utilized by all listeners. Moreover, examination of individual differences in cue weighting is a useful tool in exposing the complex relationship between perceptual cue weighting and language regularities.
Collapse
|
46
|
Effects of category learning on neural sensitivity to non-native phonetic categories. J Cogn Neurosci 2012; 24:1695-708. [PMID: 22621261 DOI: 10.1162/jocn_a_00243] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Categorical perception, an increased sensitivity to between- compared with within-category contrasts, is a stable property of native speech perception that emerges as language matures. Although recent research suggests that categorical responses to speech sounds can be found in left prefrontal as well as temporo-parietal areas, it is unclear how the neural system develops heightened sensitivity to between-category contrasts. In the current study, two groups of adult participants were trained to categorize speech sounds taken from a dental/retroflex/velar continuum according to two different boundary locations. Behavioral results suggest that for successful learners, categorization training led to increased discrimination accuracy for between-category contrasts with no concomitant increase for within-category contrasts. Neural responses to the learned category schemes were measured using a short-interval habituation design during fMRI scanning. Whereas both inferior frontal and temporal regions showed sensitivity to phonetic contrasts sampled from the continuum, only the bilateral middle frontal gyri exhibited a pattern consistent with encoding of the learned category scheme. Taken together, these results support a view in which top-down information about category membership may reshape perceptual sensitivities via attention or executive mechanisms in the frontal lobes.
Collapse
|
47
|
Abstract
We investigated comprehension of and adaptation to speech in an unfamiliar accent in older adults. Participants performed a speeded sentence verification task for accented sentences: one group upon auditory-only presentation, and the other group upon audiovisual presentation. Our questions were whether audiovisual presentation would facilitate adaptation to the novel accent, and which cognitive and linguistic measures would predict adaptation. Participants were therefore tested on a range of background tests: hearing acuity, auditory verbal short-term memory, working memory, attention-switching control, selective attention, and vocabulary knowledge. Both auditory-only and audiovisual groups showed improved accuracy and decreasing response times over the course of the experiment, effectively showing accent adaptation. Even though the total amount of improvement was similar for the auditory-only and audiovisual groups, initial rate of adaptation was faster in the audiovisual group. Hearing sensitivity and short-term and working memory measures were associated with efficient processing of the novel accent. Analysis of the relationship between accent comprehension and the background tests revealed furthermore that selective attention and vocabulary size predicted the amount of adaptation over the course of the experiment. These results suggest that vocabulary knowledge and attentional abilities facilitate the attention-shifting strategies proposed to be required for perceptual learning.
Collapse
|
48
|
Perceptual learning of dysarthric speech: a review of experimental studies. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:290-305. [PMID: 22199185 PMCID: PMC3738172 DOI: 10.1044/1092-4388(2011/10-0349)] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
PURPOSE This review article provides a theoretical overview of the characteristics of perceptual learning, reviews perceptual learning studies that pertain to dysarthric populations, and identifies directions for future research that consider the application of perceptual learning to the management of dysarthria. METHOD A critical review of the literature was conducted that summarized and synthesized previously published research in the area of perceptual learning with atypical speech. Literature related to perceptual learning of neurologically degraded speech was emphasized with the aim of identifying key directions for future research with this population. CONCLUSIONS Familiarization with unfamiliar or ambiguous speech signals can facilitate perceptual learning of that same speech signal. There is a small but growing body of evidence that perceptual learning also occurs for listeners familiarized with dysarthric speech. Perceptual learning of the dysarthric signal is both theoretically and clinically significant. In order to establish the efficacy of exploiting perceptual learning paradigms for rehabilitative gain in dysarthria management, research is required to build on existing empirical evidence and develop a theoretical framework for learning to better recognize neurologically degraded speech.
Collapse
|
49
|
The use of acoustic cues for phonetic identification: effects of spectral degradation and electric hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:1465-1479. [PMID: 22352517 PMCID: PMC3292615 DOI: 10.1121/1.3672705] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Revised: 10/10/2011] [Accepted: 12/05/2011] [Indexed: 05/30/2023]
Abstract
Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process.
Collapse
|
50
|
Word recognition reflects dimension-based statistical learning. J Exp Psychol Hum Percept Perform 2011; 37:1939-56. [PMID: 22004192 DOI: 10.1037/a0025641] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech processing requires sensitivity to long-term regularities of the native language yet demands listeners to flexibly adapt to perturbations that arise from talker idiosyncrasies such as nonnative accent. The present experiments investigate whether listeners exhibit dimension-based statistical learning of correlations between acoustic dimensions defining perceptual space for a given speech segment. While engaged in a word recognition task guided by a perceptually unambiguous voice-onset time (VOT) acoustics to signal beer, pier, deer, or tear, listeners were exposed incidentally to an artificial "accent" deviating from English norms in its correlation of the pitch onset of the following vowel (F0) to VOT. Results across four experiments are indicative of rapid, dimension-based statistical learning; reliance on the F0 dimension in word recognition was rapidly down-weighted in response to the perturbation of the correlation between F0 and VOT dimensions. However, listeners did not simply mirror the short-term input statistics. Instead, response patterns were consistent with a lingering influence of sensitivity to the long-term regularities of English. This suggests that the very acoustic dimensions defining perceptual space are not fixed and, rather, are dynamically and rapidly adjusted to the idiosyncrasies of local experience, such as might arise from nonnative-accent, dialect, or dysarthria. The current findings extend demonstrations of "object-based" statistical learning across speech segments to include incidental, online statistical learning of regularities residing within a speech segment.
Collapse
|