1
|
Murphy TK, Nozari N, Holt LL. Transfer of statistical learning from passive speech perception to speech production. Psychon Bull Rev 2023:10.3758/s13423-023-02399-8. [PMID: 37884779 DOI: 10.3758/s13423-023-02399-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2023] [Indexed: 10/28/2023]
Abstract
Communicating with a speaker with a different accent can affect one's own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In the canonical condition, /b/-/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In the reverse condition, the F0xVOT relationship reversed to create an "accent" with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners' own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners' own speech productions.
Collapse
Affiliation(s)
- Timothy K Murphy
- Department of Psychology, Carnegie Mellon University, Baker Hall, Floor 3, Frew St, Pittsburgh, PA, 15213, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, 15213, USA.
| | - Nazbanou Nozari
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, 47405, USA
| | - Lori L Holt
- Department of Psychology, University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
2
|
Hodson AJ, Shinn-Cunningham BG, Holt LL. Statistical learning across passive listening adjusts perceptual weights of speech input dimensions. Cognition 2023; 238:105473. [PMID: 37210878 DOI: 10.1016/j.cognition.2023.105473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/20/2023] [Accepted: 04/24/2023] [Indexed: 05/23/2023]
Abstract
Statistical learning across passive exposure has been theoretically situated with unsupervised learning. However, when input statistics accumulate over established representations - like speech syllables, for example - there is the possibility that prediction derived from activation of rich, existing representations may support error-driven learning. Here, across five experiments, we present evidence for error-driven learning across passive speech listening. Young adults passively listened to a string of eight beer - pier speech tokens with distributional regularities following either a canonical American-English acoustic dimension correlation or a correlation reversed to create an accent. A sequence-final test stimulus assayed the perceptual weight - the effectiveness - of the secondary dimension in signaling category membership as a function of preceding sequence regularities. Perceptual weight flexibly adjusted according to the passively experienced regularities even when the preceding regularities shifted on a trial-by-trial basis. The findings align with a theoretical view that activation of established internal representations can support learning across statistical regularities via error-driven learning. At the broadest level, this suggests that not all statistical learning need be unsupervised. Moreover, these findings help to account for how cognitive systems may accommodate competing demands for flexibility and stability: instead of overwriting existing representations when short-term input distributions depart from the norms, the mapping from input to category representations may be dynamically - and rapidly - adjusted via error-driven learning from predictions derived from internal representations.
Collapse
Affiliation(s)
- Alana J Hodson
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
3
|
Jasmin K, Tierney A, Obasih C, Holt L. Short-term perceptual reweighting in suprasegmental categorization. Psychon Bull Rev 2023; 30:373-382. [PMID: 35915382 PMCID: PMC9971089 DOI: 10.3758/s13423-022-02146-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2022] [Indexed: 11/08/2022]
Abstract
Segmental speech units such as phonemes are described as multidimensional categories whose perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. For example, when speech is altered to create an "accent" in which two acoustic dimensions are correlated in a manner opposite that of long-term experience, the dimension that carries less perceptual weight is down-weighted to contribute less in category decisions. It remains unclear, however, whether this short-term reweighting extends to perception of suprasegmental features that span multiple phonemes, syllables, or words, in part because it has remained debatable whether suprasegmental features are perceived categorically. Here, we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial "accent" in which F0 and duration (established in prior research on English speech as "primary" and "secondary" dimensions, respectively) covaried atypically. When categorizing "accented" speech, listeners rapidly down-weighted the secondary dimension (duration). This result indicates that listeners continually track short-term regularities across speech input and dynamically adjust the weight of acoustic evidence for suprasegmental decisions. Thus, dimension-based statistical learning appears to be a widespread phenomenon in speech perception extending to both segmental and suprasegmental categorization.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychology, Wolfson Building, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | - Lori Holt
- Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
4
|
Wu YC, Holt LL. Phonetic category activation predicts the direction and magnitude of perceptual adaptation to accented speech. J Exp Psychol Hum Percept Perform 2022; 48:913-925. [PMID: 35849375 PMCID: PMC10236200 DOI: 10.1037/xhp0001037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unfamiliar accents can systematically shift speech acoustics away from community norms and reduce comprehension. Yet, limited exposure improves comprehension. This perceptual adaptation indicates that the mapping from acoustics to speech representations is dynamic, rather than fixed. But, what drives adjustments is debated. Supervised learning accounts posit that activation of an internal speech representation via disambiguating information generates predictions about patterns of speech input typically associated with the representation. When actual input mismatches predictions, the mapping is adjusted. We tested two hypotheses of this account across consonants and vowels as listeners categorized speech conveying an English-like acoustic regularity or an artificial accent. Across conditions, signal manipulations impacted which of two acoustic dimensions best conveyed category identity, and predicted which dimension would exhibit the effects of perceptual adaptation. Moreover, the strength of phonetic category activation, as estimated by categorization responses reliant on the dominant acoustic dimension, predicted the magnitude of adaptation observed across listeners. The results align with predictions of supervised learning accounts, suggesting that perceptual adaptation arises from speech category activation, corresponding predictions about the patterns of acoustic input that align with the category, and adjustments in subsequent speech perception when input mismatches these expectations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
5
|
Yu ACL. Perceptual Cue Weighting Is Influenced by the Listener's Gender and Subjective Evaluations of the Speaker: The Case of English Stop Voicing. Front Psychol 2022; 13:840291. [PMID: 35529558 PMCID: PMC9067435 DOI: 10.3389/fpsyg.2022.840291] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 02/28/2022] [Indexed: 11/13/2022] Open
Abstract
Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance.
Collapse
Affiliation(s)
- Alan C L Yu
- Chicago Phonology Laboratory, Department of Linguistics, University of Chicago, Chicago, IL, United States
| |
Collapse
|
6
|
Feldman NH, Goldwater S, Dupoux E, Schatz T. Do Infants Really Learn Phonetic Categories? OPEN MIND 2022; 5:113-131. [PMID: 35024527 PMCID: PMC8746127 DOI: 10.1162/opmi_a_00046] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 08/06/2021] [Indexed: 11/16/2022] Open
Abstract
Early changes in infants’ ability to perceive native and nonnative speech sound contrasts are typically attributed to their developing knowledge of phonetic categories. We critically examine this hypothesis and argue that there is little direct evidence of category knowledge in infancy. We then propose an alternative account in which infants’ perception changes because they are learning a perceptual space that is appropriate to represent speech, without yet carving up that space into phonetic categories. If correct, this new account has substantial implications for understanding early language development.
Collapse
Affiliation(s)
- Naomi H Feldman
- Department of Linguistics and UMIACS, University of Maryland, College Park, MD, USA
| | | | - Emmanuel Dupoux
- Cognitive Machine Learning (ENS - EHESS - PSL Research University - CNRS - INRIA), Paris, France
| | - Thomas Schatz
- Department of Linguistics and UMIACS, University of Maryland, College Park, MD, USA
| |
Collapse
|
7
|
Zhang X, Wu YC, Holt LL. The Learning Signal in Perceptual Tuning of Speech: Bottom Up Versus Top-Down Information. Cogn Sci 2021; 45:e12947. [PMID: 33682208 DOI: 10.1111/cogs.12947] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/01/2022]
Abstract
Cognitive systems face a tension between stability and plasticity. The maintenance of long-term representations that reflect the global regularities of the environment is often at odds with pressure to flexibly adjust to short-term input regularities that may deviate from the norm. This tension is abundantly clear in speech communication when talkers with accents or dialects produce input that deviates from a listener's language community norms. Prior research demonstrates that when bottom-up acoustic information or top-down word knowledge is available to disambiguate speech input, there is short-term adaptive plasticity such that subsequent speech perception is shifted even in the absence of the disambiguating information. Although such effects are well-documented, it is not yet known whether bottom-up and top-down resolution of ambiguity may operate through common processes, or how these information sources may interact in guiding the adaptive plasticity of speech perception. The present study investigates the joint contributions of bottom-up information from the acoustic signal and top-down information from lexical knowledge in the adaptive plasticity of speech categorization according to short-term input regularities. The results implicate speech category activation, whether from top-down or bottom-up sources, in driving rapid adjustment of listeners' reliance on acoustic dimensions in speech categorization. Broadly, this pattern of perception is consistent with dynamic mapping of input to category representations that is flexibly tuned according to interactive processing accommodating both lexical knowledge and idiosyncrasies of the acoustic input.
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
8
|
Abstract
Recent research demonstrates that the relationship between an acoustic dimension and speech categories is not static. Rather, it is influenced by the evolving distribution of dimensional regularity experienced across time, and specific to experienced individual sounds. Three studies examine the nature of this perceptual, dimension-based statistical learning of artificially accented [b] and [p] speech categories in online word recognition by testing generalization of learning across contexts, and testing the effect of a larger word list across which learning is induced. The results indicate that whereas learning of accented [b] and [p] generalizes across contexts, generalization to contexts not experienced in the accent is weaker even for the same speech categories [b] and [p] spoken by the same speaker. The results support a rich model of speech representation that is sensitive to context-dependent variation in the way the acoustic dimensions are related to speech categories.
Collapse
|
9
|
Schertz J, Clare EJ. Phonetic cue weighting in perception and production. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1521. [DOI: 10.1002/wcs.1521] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 07/12/2019] [Accepted: 08/25/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Jessamyn Schertz
- University of Toronto Department of Language Studies Mississauga Ontario Canada
| | - Emily J. Clare
- University of Toronto Department of Linguistics Toronto Ontario Canada
| |
Collapse
|
10
|
Zhang X, Holt LL. Simultaneous tracking of coevolving distributional regularities in speech. J Exp Psychol Hum Percept Perform 2018; 44:1760-1779. [PMID: 30272462 PMCID: PMC6205888 DOI: 10.1037/xhp0000569] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech processing depends upon mapping variable acoustic speech input in a manner that reflects the long-term regularities of the native language. Yet, these mappings are flexible such that introduction of short-term distributional regularities in speech input, like those arising from foreign accents or talker idiosyncrasies, leads to rapid adjustments in the effectiveness of acoustic dimensions in signaling phonetic categories. The present experiments investigate whether the system is able to track simultaneous short-term distributional statistics present in speech input or if, instead, the global regularity jointly defined by these distributions dominates. Three experiments establish that adult listeners are able to track distinct simultaneously evolving regularities across time, given information to support the "binning" of acoustic instances. Both voice quality and visual information to indicate talker supported tracking of coevolving distributional regularities, even when the regularities are opposing and even when the acoustic speech tokens contributing to the distinct distributions are identical. This indicates that reweighting of perceptual dimensions in response to short-term regularities in speech input is not simply an accumulation of acoustic instances. Rather, the system is able to track multiple context-sensitive regularities simultaneously, with rapid context-dependent adaptive adjustments in how acoustic speech input maps to phonetic categories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|