1
|
Zebe-Sheng F, Watter C, Schmid S. The increasing importance of voice onset time in the perception and production of Zurich German plosives: An ongoing sound change. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:1261-1275. [PMID: 39964804 DOI: 10.1121/10.0034842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 12/12/2024] [Indexed: 02/20/2025]
Abstract
Recent evidence suggests an ongoing sound change in Zurich German, where the primary cue between lenis and fortis plosives is commonly considered to be closure duration, while both plosive types are traditionally unaspirated and phonetically voiceless. There has been a shift toward more lexical items being aspirated by younger speakers, who also are shown to produce generally longer voice onset times (VOTs) in comparison to older speakers. The current study investigates word-medial and word-initial plosives in speech perception and production. Using the apparent-time paradigm, two experiments were conducted with 48 speakers of Zurich German belonging to 2 age groups. Results confirm that younger speakers produce more aspiration in word-initial fortis plosives than older speakers but disconfirm previous findings which found a reduction in closure duration of fortis plosives. Results from the perception experiment reveal that, word-initially, VOT seems to increase in importance and closure duration is not always sufficient in distinguishing between lenis and fortis plosives. Results further highlight the importance of lexical differences, according to which production and perception are either aligned or misaligned. Overall, the current study provides evidence for a sound change affecting word-initial fortis plosives in Zurich German in speech perception and production.
Collapse
Affiliation(s)
- Franka Zebe-Sheng
- Department of Computational Linguistics, University of Zurich, Rämistrasse 71, 8006 Zurich, Switzerland
| | - Camille Watter
- Department of Computational Linguistics, University of Zurich, Rämistrasse 71, 8006 Zurich, Switzerland
| | - Stephan Schmid
- Department of Computational Linguistics, University of Zurich, Rämistrasse 71, 8006 Zurich, Switzerland
| |
Collapse
|
2
|
Tal S, Grossman E, Arnon I. Infant-directed speech becomes less redundant as infants grow: Implications for language learning. Cognition 2024; 249:105817. [PMID: 38810427 DOI: 10.1016/j.cognition.2024.105817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 05/13/2024] [Accepted: 05/14/2024] [Indexed: 05/31/2024]
Abstract
Do speakers use less redundant language with more proficient interlocutors? Both the communicative efficiency framework and the language development literature predict that speech directed to younger infants should be more redundant than speech directed to older infants. Here, we test this by quantifying redundancy in infant-directed speech using entropy rate - an information-theoretic measure reflecting average degree of repetitiveness. While IDS is often described as repetitive, entropy rate provides a novel holistic measure of redundancy in this speech genre. Using two developmental corpora, we compare entropy rates of samples taken from different ages. We find that parents use less redundant speech when talking to older children, illustrating an effect of perceived interlocutor proficiency on redundancy. The developmental decrease in redundancy reflects a decrease in lexical repetition, but also a decrease in repetitions of multi-word sequences, highlighting the importance of larger sequences in early language learning.
Collapse
Affiliation(s)
- Shira Tal
- Department of Linguistics and English Language, University of Edinburgh, United Kingdom; Department of Cognitive Sciences, The Hebrew University of Jerusalem, Israel.
| | - Eitan Grossman
- Department of Linguistics, The Hebrew University of Jerusalem, Israel
| | - Inbal Arnon
- Department of Psychology, The Hebrew University of Jerusalem, Israel
| |
Collapse
|
3
|
Zellou G, Kim L, Gendrot C. Comparing human and machine's use of coarticulatory vowel nasalization for linguistic classificationa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:489-502. [PMID: 39013039 DOI: 10.1121/10.0027932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/27/2024] [Indexed: 07/18/2024]
Abstract
Anticipatory coarticulation is a highly informative cue to upcoming linguistic information: listeners can identify that the word is ben and not bed by hearing the vowel alone. The present study compares the relative performances of human listeners and a self-supervised pre-trained speech model (wav2vec 2.0) in the use of nasal coarticulation to classify vowels. Stimuli consisted of nasalized (from CVN words) and non-nasalized (from CVCs) American English vowels produced by 60 humans and generated in 36 TTS voices. wav2vec 2.0 performance is similar to human listener performance, in aggregate. Broken down by vowel type: both wav2vec 2.0 and listeners perform higher for non-nasalized vowels produced naturally by humans. However, wav2vec 2.0 shows higher correct classification performance for nasalized vowels, than for non-nasalized vowels, for TTS voices. Speaker-level patterns reveal that listeners' use of coarticulation is highly variable across talkers. wav2vec 2.0 also shows cross-talker variability in performance. Analyses also reveal differences in the use of multiple acoustic cues in nasalized vowel classifications across listeners and the wav2vec 2.0. Findings have implications for understanding how coarticulatory variation is used in speech perception. Results also can provide insight into how neural systems learn to attend to the unique acoustic features of coarticulation.
Collapse
Affiliation(s)
- Georgia Zellou
- Phonetics Lab, Linguistics Department, University of California-Davis, Davis, California 95616, USA
| | - Lila Kim
- Laboratoire de Phonétique et Phonologie, Université Sorbonne Nouvelle, UMR 7018 CNRS, Paris, France
| | - Cédric Gendrot
- Laboratoire de Phonétique et Phonologie, Université Sorbonne Nouvelle, UMR 7018 CNRS, Paris, France
| |
Collapse
|
4
|
Vonessen J, Aoki NB, Cohn M, Zellou G. Comparing perception of L1 and L2 English by human listeners and machines: Effect of interlocutor adaptationsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:3060-3070. [PMID: 38717210 DOI: 10.1121/10.0025930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/21/2024] [Indexed: 09/20/2024]
Abstract
Speakers tailor their speech to different types of interlocutors. For example, speech directed to voice technology has different acoustic-phonetic characteristics than speech directed to a human. The present study investigates the perceptual consequences of human- and device-directed registers in English. We compare two groups of speakers: participants whose first language is English (L1) and bilingual L1 Mandarin-L2 English talkers. Participants produced short sentences in several conditions: an initial production and a repeat production after a human or device guise indicated either understanding or misunderstanding. In experiment 1, a separate group of L1 English listeners heard these sentences and transcribed the target words. In experiment 2, the same productions were transcribed by an automatic speech recognition (ASR) system. Results show that transcription accuracy was highest for L1 talkers for both human and ASR transcribers. Furthermore, there were no overall differences in transcription accuracy between human- and device-directed speech. Finally, while human listeners showed an intelligibility benefit for coda repair productions, the ASR transcriber did not benefit from these enhancements. Findings are discussed in terms of models of register adaptation, phonetic variation, and human-computer interaction.
Collapse
Affiliation(s)
- Jules Vonessen
- Department of Linguistics, University of California, Davis, Davis, California 95616, USA
| | - Nicholas B Aoki
- Department of Linguistics, University of California, Davis, Davis, California 95616, USA
| | - Michelle Cohn
- Department of Linguistics, University of California, Davis, Davis, California 95616, USA
| | - Georgia Zellou
- Department of Linguistics, University of California, Davis, Davis, California 95616, USA
| |
Collapse
|
5
|
Beach SD, Niziolek CA. Inhibitory modulation of speech trajectories: Evidence from a vowel-modified Stroop task. Cogn Neuropsychol 2024; 41:51-69. [PMID: 38778635 PMCID: PMC11269046 DOI: 10.1080/02643294.2024.2315831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 01/15/2024] [Indexed: 05/25/2024]
Abstract
How does cognitive inhibition influence speaking? The Stroop effect is a classic demonstration of the interference between reading and color naming. We used a novel variant of the Stroop task to measure whether this interference impacts not only the response speed, but also the acoustic properties of speech. Speakers named the color of words in three categories: congruent (e.g., red written in red), color-incongruent (e.g., green written in red), and vowel-incongruent - those with partial phonological overlap with their color (e.g., rid written in red, grain in green, and blow in blue). Our primary aim was to identify any effect of the distractor vowel on the acoustics of the target vowel. Participants were no slower to respond on vowel-incongruent trials, but formant trajectories tended to show a bias away from the distractor vowel, consistent with a phenomenon of acoustic inhibition that increases contrast between confusable alternatives.
Collapse
Affiliation(s)
- Sara D Beach
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
| | - Caroline A Niziolek
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
6
|
Li A, Roberts G. Co-Occurrence, Extension, and Social Salience: The Emergence of Indexicality in an Artificial Language. Cogn Sci 2023; 47:e13290. [PMID: 37183582 DOI: 10.1111/cogs.13290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 03/28/2023] [Accepted: 03/30/2023] [Indexed: 05/16/2023]
Abstract
We investigated the emergence of sociolinguistic indexicality using an artificial-language-learning paradigm. Sociolinguistic indexicality involves the association of linguistic variants with nonlinguistic social or contextual features. Any linguistic variant can acquire "constellations" of such indexical meanings, though they also exhibit an ordering, with first-order indices associated with particular speaker groups and higher-order indices targeting stereotypical attributes of those speakers. Much natural-language research has been conducted on this phenomenon, but little experimental work has focused on how indexicality emerges. Here, we present three miniature artificial-language experiments designed to break ground on this question. Results show ready formation of first-order indexicality based on co-occurrence alone, with higher-order indexicality emerging as a result of extension to new speaker groups, modulated by the perceived practical importance of the indexed social feature.
Collapse
Affiliation(s)
- Aini Li
- Department of Linguistics, University of Pennsylvania
| | | |
Collapse
|
7
|
Zellou G, Lahrouchi M, Bensoukas K. Clear speech in Tashlhiyt Berber: The perception of typologically uncommon word-initial contrasts by native and naive listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3429. [PMID: 36586870 DOI: 10.1121/10.0016579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 10/25/2022] [Indexed: 06/17/2023]
Abstract
Tashlhiyt Berber is known for having typologically unusual word-initial phonological contrasts, specifically, word-initial singleton-geminate minimal pairs (e.g., sin vs ssin) and sequences of consonants that violate the sonority sequencing principle (e.g., non-rising sonority sequences: fsin). The current study investigates the role of a listener-oriented speaking style on the perceptual enhancement of these rarer phonological contrasts. It examines the perception of word-initial singleton, geminate, and complex onsets in Tashlhiyt Berber across clear and casual speaking styles by native and naive listeners. While clear speech boosts the discriminability of pairs containing singleton-initial words for both listener groups, only native listeners performed better in discriminating between initial singleton-geminate contrasts in clear speech. Clear speech did not improve perception for lexical contrasts containing a non-rising-sonority consonant cluster for either listener group. These results are discussed in terms of how clear speech can inform phonological typology and the role of phonetic enhancement in language-universal vs language-specific speech perception.
Collapse
Affiliation(s)
- Georgia Zellou
- Department of Linguistics, University of California, Davis, California 95616, USA
| | - Mohamed Lahrouchi
- Centre National de la Recherche Scientifique & Université Paris 8, Paris, France
| | - Karim Bensoukas
- Faculty of Letters and Human Sciences, Mohammed V University in Rabat, Rabat, Morocco
| |
Collapse
|
8
|
Bi Y, Chen Y. The effects of lexical frequency and homophone neighborhood density on incomplete tonal neutralization. Front Psychol 2022; 13:867353. [PMID: 36506959 PMCID: PMC9730877 DOI: 10.3389/fpsyg.2022.867353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 10/13/2022] [Indexed: 11/27/2022] Open
Abstract
We investigated the effects of lexical frequency and homophone neighborhood density on the acoustic realization of two neutralizing falling tones in Dalian Mandarin Chinese. Monosyllabic morphemes containing the target tones (Tone 1 and Tone 4) were produced by 60 native speakers from two generations (middle-aged vs. young). The duration of tone-bearing syllable rhymes, as well as the F0 curves and velocity profiles of the lexical tones were quantitatively analyzed via linear mixed-effects modeling and functional data analysis. Results showed no durational difference between T1 and T4. However, the F0 contours of the two falling tones were incompletely neutralized for both young and middle-aged speakers. Lexical frequency showed little effect on the incomplete tonal neutralization; there were significant differences in the turning point of the two falling tones in syllables with both high and low lexical frequency. However, homophone neighborhood density showed an effect on the incomplete neutralization between the two falling tones, reflected in significant differences in the slope and turning point of the F0 velocity profiles between the two tones carried by syllables with low density but not with high density. Moreover, homophone neighborhood density also affected the duration, the turning point of F0 curves, and velocity profiles of the T1- and T4-syllables. These results are discussed with consideration of social phonetic variations, the theory of Hypo- and Hyper-articulation (H&H), the Neighborhood Activation Model, and communication-based information-theoretic accounts. Collectively, these results broaden our understanding of the effects that lexical properties have on the acoustic details of lexical tone production and tonal sound changes.
Collapse
Affiliation(s)
- Yifei Bi
- College of Foreign Languages, University of Shanghai for Science and Technology, Shanghai, China
| | - Yiya Chen
- Leiden University Centre for Linguistics, Leiden, Netherlands
- Leiden Institute for Brain and Cognition, Leiden, Netherlands
| |
Collapse
|
9
|
Ventura R, Plotkin JB, Roberts G. Drift as a Driver of Language Change: An Artificial Language Experiment. Cogn Sci 2022; 46:e13197. [PMID: 36083286 PMCID: PMC9787808 DOI: 10.1111/cogs.13197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 07/25/2022] [Accepted: 08/08/2022] [Indexed: 12/30/2022]
Abstract
Over half a century ago, George Zipf observed that more frequent words tend to be older. Corpus studies since then have confirmed this pattern, with more frequent words being replaced and regularized less often than less frequent words. Two main hypotheses have been proposed to explain this: that frequent words change less because selection against innovation is stronger at higher frequencies, or that they change less because stochastic drift is stronger at lower frequencies. Here, we report the first experimental test of these hypotheses. Participants were tasked with learning a miniature language consisting of two nouns and two plural markers. Nouns occurred at different frequencies and were subjected to treatments that varied drift and selection. Using a model that accounts for participant heterogeneity, we measured the rate of noun regularization, the strength of selection, and the strength of drift in participant responses. Results suggest that drift alone is sufficient to generate the elevated rate of regularization we observed in low-frequency nouns, adding to a growing body of evidence that drift may be a major driver of language change.
Collapse
Affiliation(s)
- Rafael Ventura
- Social and Cultural Evolution Working GroupUniversity of Pennsylvania
| | - Joshua B. Plotkin
- Social and Cultural Evolution Working GroupUniversity of Pennsylvania
- Department of BiologyUniversity of Pennsylvania
- Center for Mathematical BiologyUniversity of Pennsylvania
| | - Gareth Roberts
- Social and Cultural Evolution Working GroupUniversity of Pennsylvania
- Department of LinguisticsUniversity of Pennsylvania
| |
Collapse
|
10
|
Piazza G, Martin CD, Kalashnikova M. The Acoustic Features and Didactic Function of Foreigner-Directed Speech: A Scoping Review. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2896-2918. [PMID: 35914012 DOI: 10.1044/2022_jslhr-21-00609] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE This scoping review considers the acoustic features of a clear speech register directed to nonnative listeners known as foreigner-directed speech (FDS). We identify vowel hyperarticulation and low speech rate as the most representative acoustic features of FDS; other features, including wide pitch range and high intensity, are still under debate. We also discuss factors that may influence the outcomes and characteristics of FDS. We start by examining accommodation theories, outlining the reasons why FDS is likely to serve a didactic function by helping listeners acquire a second language (L2). We examine how this speech register adapts to listeners' identities and linguistic needs, suggesting that FDS also takes listeners' L2 proficiency into account. To confirm the didactic function of FDS, we compare it to other clear speech registers, specifically infant-directed speech and Lombard speech. CONCLUSIONS Our review reveals that research has not yet established whether FDS succeeds as a didactic tool that supports L2 acquisition. Moreover, a complex set of factors determines specific realizations of FDS, which need further exploration. We conclude by summarizing open questions and indicating directions and recommendations for future research.
Collapse
Affiliation(s)
- Giorgio Piazza
- Basque Center on Cognition, Brain and Language (BCBL), Donostia-San Sebastián, Spain
- Department of Social Sciences and Law, Universidad del País Vasco/Euskal Herriko Unibertsitatea, Donostia-San Sebastián, Spain
| | - Clara D Martin
- Basque Center on Cognition, Brain and Language (BCBL), Donostia-San Sebastián, Spain
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | - Marina Kalashnikova
- Basque Center on Cognition, Brain and Language (BCBL), Donostia-San Sebastián, Spain
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
11
|
Gutz SE, Rowe HP, Tilton-Bolowsky VE, Green JR. Speaking with a KN95 face mask: a within-subjects study on speaker adaptation and strategies to improve intelligibility. Cogn Res Princ Implic 2022; 7:73. [PMID: 35907167 PMCID: PMC9339031 DOI: 10.1186/s41235-022-00423-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 07/18/2022] [Indexed: 11/15/2022] Open
Abstract
Mask-wearing during the COVID-19 pandemic has prompted a growing interest in the functional impact of masks on speech and communication. Prior work has shown that masks dampen sound, impede visual communication cues, and reduce intelligibility. However, more work is needed to understand how speakers change their speech while wearing a mask and to identify strategies to overcome the impact of wearing a mask. Data were collected from 19 healthy adults during a single in-person session. We investigated the effects of wearing a KN95 mask on speech intelligibility, as judged by two speech-language pathologists, examined speech kinematics and acoustics associated with mask-wearing, and explored KN95 acoustic filtering. We then considered the efficacy of three speaking strategies to improve speech intelligibility: Loud, Clear, and Slow speech. To inform speaker strategy recommendations, we related findings to self-reported speaker effort. Results indicated that healthy speakers could compensate for the presence of a mask and achieve normal speech intelligibility. Additionally, we showed that speaking loudly or clearly-and, to a lesser extent, slowly-improved speech intelligibility. However, using these strategies may require increased physical and cognitive effort and should be used only when necessary. These results can inform recommendations for speakers wearing masks, particularly those with communication disorders (e.g., dysarthria) who may struggle to adapt to a mask but can respond to explicit instructions. Such recommendations may further help non-native speakers and those communicating in a noisy environment or with listeners with hearing loss.
Collapse
Affiliation(s)
- Sarah E. Gutz
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA USA
| | - Hannah P. Rowe
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Building 79/96, 2nd floor, 13th Street, Boston, MA 02129 USA
| | - Victoria E. Tilton-Bolowsky
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Building 79/96, 2nd floor, 13th Street, Boston, MA 02129 USA
| | - Jordan R. Green
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA USA
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Building 79/96, 2nd floor, 13th Street, Boston, MA 02129 USA
| |
Collapse
|
12
|
Redundancy can benefit learning: Evidence from word order and case marking. Cognition 2022; 224:105055. [DOI: 10.1016/j.cognition.2022.105055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 01/09/2022] [Accepted: 02/01/2022] [Indexed: 11/20/2022]
|
13
|
Zhang H, Wiener S, Holt LL. Adjustment of cue weighting in speech by speakers and listeners: Evidence from amplitude and duration modifications of Mandarin Chinese tone. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:992. [PMID: 35232077 PMCID: PMC8846952 DOI: 10.1121/10.0009378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 01/07/2022] [Accepted: 01/10/2022] [Indexed: 06/14/2023]
Abstract
Speech contrasts are signaled by multiple acoustic dimensions, but these dimensions are not equally diagnostic. Moreover, the relative diagnosticity, or weight, of acoustic dimensions in speech can shift in different communicative contexts for both speech perception and speech production. However, the literature remains unclear on whether, and if so how, talkers adjust speech to emphasize different acoustic dimensions in the context of changing communicative demands. Here, we examine the interplay of flexible cue weights in speech production and perception across amplitude and duration, secondary non-spectral acoustic dimensions for phonated Mandarin Chinese lexical tone, across natural speech and whispering, which eliminates fundamental frequency contour, the primary acoustic dimension. Phonated and whispered Mandarin productions from native talkers revealed enhancement of both duration and amplitude cues in whispered, compared to phonated speech. When nonspeech amplitude-modulated noises modeled these patterns of enhancement, identification of the noises as Mandarin lexical tone categories was more accurate than identification of noises modeling phonated speech amplitude and duration cues. Thus, speakers exaggerate secondary cues in whispered speech and listeners make use of this information. Yet, enhancement is not symmetric among the four Mandarin lexical tones, indicating possible constraints on the realization of this enhancement.
Collapse
Affiliation(s)
- Hui Zhang
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Seth Wiener
- Department of Modern Languages, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| | - Lori L Holt
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
14
|
Scarborough R, Zellou G. Out of sight, out of mind: The influence of communicative load and phonological neighborhood density on phonetic variation in real listener-directed speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:577. [PMID: 35105023 DOI: 10.1121/10.0009233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 12/11/2021] [Indexed: 06/14/2023]
Abstract
Some models of speech production propose that speech variation reflects an adaptive trade-off between the needs of the listener and constraints on the speaker. The current study considers communicative load as both a situational and lexical variable that influences phonetic variation in speech to real interlocutors. The current study investigates whether the presence or absence of a target word in the sight of a real listener influences speakers' patterns of variation during a communicative task. To test how lexical difficulty also modulates intelligibility, target words varied in phonological neighborhood density (ND), a measure of lexical difficulty. Acoustic analyses reveal that speakers produced longer vowels in words that were not visually present for the listener to see, compared to when the listener could see those words. This suggests that speakers assess in real time the presence or absence of supportive visual information in assessing listener comprehension difficulty. Furthermore, the presence or absence of the word interacted with ND to predict both vowel duration and hyperarticulation patterns. These findings indicate that lexical measures of a word's difficulty and speakers' online assessment of lexical intelligibility (based on a word's visual presence or not) interactively influence phonetic modifications during communication with a real listener.
Collapse
Affiliation(s)
- Rebecca Scarborough
- Linguistics Department, University of Colorado Boulder, Boulder, Colorado 80309, USA
| | - Georgia Zellou
- Linguistics Department, University of California at Davis, Davis, California 95616, USA
| |
Collapse
|
15
|
Keerstock S, Smiljanic R. Reading aloud in clear speech reduces sentence recognition memory and recall for native and non-native talkers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3387. [PMID: 34852619 DOI: 10.1121/10.0006732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 09/23/2021] [Indexed: 06/13/2023]
Abstract
Speaking style variation plays a role in how listeners remember speech. Compared to conversational sentences, clearly spoken sentences were better recalled and identified as previously heard by native and non-native listeners. The present study investigated whether speaking style variation also plays a role in how talkers remember speech that they produce. Although distinctive forms of production (e.g., singing, speaking loudly) can enhance memory, the cognitive and articulatory efforts required to plan and produce listener-oriented hyper-articulated clear speech could detrimentally affect encoding and subsequent retrieval. Native and non-native English talkers' memories for sentences that they read aloud in clear and conversational speaking styles were assessed through a sentence recognition memory task (experiment 1; N = 90) and a recall task (experiment 2; N = 75). The results showed enhanced recognition memory and recall for sentences read aloud conversationally rather than clearly for both talker groups. In line with the "effortfulness" hypothesis, producing clear speech may increase the processing load diverting resources from memory encoding. Implications for the relationship between speech perception and production are discussed.
Collapse
Affiliation(s)
- Sandie Keerstock
- Department of Psychological Sciences, University of Missouri, 124 Psychology Building, 200 South 7th Street, Columbia, Missouri 65211, USA
| | - Rajka Smiljanic
- Department of Linguistics, University of Texas at Austin, 305 East 23rd Street STOP B5100, Austin, Texas 78712, USA
| |
Collapse
|
16
|
Kapatsinski V. Hierarchical Inference in Sound Change: Words, Sounds, and Frequency of Use. Front Psychol 2021; 12:652664. [PMID: 34456784 PMCID: PMC8387583 DOI: 10.3389/fpsyg.2021.652664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 07/08/2021] [Indexed: 11/13/2022] Open
Abstract
This paper aims examines the role of hierarchical inference in sound change. Through hierarchical inference, a language learner can distribute credit for a pronunciation between the intended phone and the larger units in which it is embedded, such as triphones, morphemes, words and larger syntactic constructions and collocations. In this way, hierarchical inference resolves the longstanding debate about the unit of sound change: it is not necessary for change to affect only sounds, or only words. Instead, both can be assigned their proper amount of credit for a particular pronunciation of a phone. Hierarchical inference is shown to generate novel predictions for the emergence of stable variation. Under standard assumptions about linguistic generalization, it also generates a counterintuitive prediction of a U-shaped frequency effect in an advanced articulatorily-motivated sound change. Once the change has progressed far enough for the phone to become associated with the reduced pronunciation, novel words will be more reduced than existing words that, for any reason, have become associated with the unreduced variant. Avoiding this prediction requires learners to not consider novel words to be representative of the experienced lexicon. Instead, learners should generalize to novel words from other words that are likely to exhibit similar behavior: rare words, and the words that occur in similar contexts. Directions for future work are outlined.
Collapse
|
17
|
Marklund U, Marklund E, Gustavsson L. Relationship Between Parent Vowel Hyperarticulation in Infant-Directed Speech and Infant Phonetic Complexity on the Level of Conversational Turns. Front Psychol 2021; 12:688242. [PMID: 34421739 PMCID: PMC8371631 DOI: 10.3389/fpsyg.2021.688242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/11/2021] [Indexed: 11/13/2022] Open
Abstract
When speaking to infants, parents typically use infant-directed speech, a speech register that in several aspects differs from that directed to adults. Vowel hyperarticulation, that is, extreme articulation of vowels, is one characteristic sometimes found in infant-directed speech, and it has been suggested that there exists a relationship between how much vowel hyperarticulation parents use when speaking to their infant and infant language development. In this study, the relationship between parent vowel hyperarticulation and phonetic complexity of infant vocalizations is investigated. Previous research has shown that on the level of subject means, a positive correlational relationship exists. However, the previous findings do not provide information about the directionality of that relationship. In this study the relationship is investigated on a conversational turn level, which makes it possible to draw conclusions on whether the behavior of the infant is impacting the parent, the behavior of the parent is impacting the infant, or both. Parent vowel hyperarticulation was quantified using the vhh-index, a measure that allows vowel hyperarticulation to be estimated for individual vowel tokens. Phonetic complexity of infant vocalizations was calculated using the Word Complexity Measure for Swedish. Findings were unexpected in that a negative relationship was found between parent vowel hyperarticulation and phonetic complexity of the immediately following infant vocalization. Directionality was suggested by the fact that no such relationship was found between infant phonetic complexity and vowel hyperarticulation of the immediately following parent utterance. A potential explanation for these results is that high degrees of vowel hyperarticulation either provide, or co-occur with, large amounts of phonetic and/or linguistic information, which may occupy processing resources to an extent that affects production of the next vocalization.
Collapse
Affiliation(s)
- Ulrika Marklund
- Division of Sensory Organs and Communication, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden.,Department of Neurology, Speech and Language Clinic, Danderyd Hospital, Stockholm, Sweden.,Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Ellen Marklund
- Phonetics Laboratory, Stockholm Babylab, Department of Linguistics, Stockholm University, Stockholm, Sweden
| | - Lisa Gustavsson
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden.,Phonetics Laboratory, Stockholm Babylab, Department of Linguistics, Stockholm University, Stockholm, Sweden
| |
Collapse
|
18
|
Long M, Moore I, Mollica F, Rubio-Fernandez P. Contrast perception as a visual heuristic in the formulation of referential expressions. Cognition 2021; 217:104879. [PMID: 34418775 DOI: 10.1016/j.cognition.2021.104879] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 07/23/2021] [Accepted: 08/11/2021] [Indexed: 11/17/2022]
Abstract
We hypothesize that contrast perception works as a visual heuristic, such that when speakers perceive a significant degree of contrast in a visual context, they tend to produce the corresponding adjective to describe a referent. The contrast perception heuristic supports efficient audience design, allowing speakers to produce referential expressions with minimum expenditure of cognitive resources, while facilitating the listener's visual search for the referent. We tested the perceptual contrast hypothesis in three language-production experiments. Experiment 1 revealed that speakers overspecify color adjectives in polychrome displays, whereas in monochrome displays they overspecified other properties that were contrastive. Further support for the contrast perception hypothesis comes from a re-analysis of previous work, which confirmed that color contrast elicits color overspecification when detected in a given display, but not when detected across monochrome trials. Experiment 2 revealed that even atypical colors (which are often overspecified) are only mentioned if there is color contrast. In Experiment 3, participants named a target color faster in monochrome than in polychrome displays, suggesting that the effect of color contrast is not analogous to ease of production. We conclude that the tendency to overspecify color in polychrome displays is not a bottom-up effect driven by the visual salience of color as a property, but possibly a learned communicative strategy. We discuss the implications of our account for pragmatic theories of referential communication and models of audience design, challenging the view that overspecification is a form of egocentric behavior.
Collapse
Affiliation(s)
| | - Isabelle Moore
- Psychology Department, University of Virginia, United States of America
| | - Francis Mollica
- Informatics Department, University of Edinburgh, United Kingdom
| | - Paula Rubio-Fernandez
- Philosophy Department, University of Oslo, Norway; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America.
| |
Collapse
|
19
|
Henriksen N, Coetzee AW, García-Amaya L, Fischer M. Exploring language dominance through code-switching: intervocalic voiced stop lenition in Afrikaans-Spanish bilinguals. PHONETICA 2021; 78:201-240. [PMID: 34162023 DOI: 10.1515/phon-2021-2005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The present study examines the relationship between the two grammars of bilingual speakers, the linguistic ecologies in which the L1 and L2 become active, and how these topics can be explored in a bilingual community undergoing L1 attrition. Our experiment focused on the production of intervocalic phonemic voiced stops for L1-Afrikaans/L2-Spanish bilinguals in Patagonia, Argentina. While these phonemes undergo systematic intervocalic lenition in Spanish (e.g., /b d ɡ/ > [β ð ɣ]), they do not in Afrikaans (e.g., /b d/ > [b d]). The bilingual participants in our study produced target Afrikaans and Spanish words in unilingual and code-switched speaking contexts. The results show that: (i) the participants produce separate phonetic categories in Spanish and Afrikaans; (ii) code-switching affects the production of the target sounds asymmetrically, such that L1 Afrikaans influences the production of L2 Spanish sounds but not vice versa; and (iii) this L1-to-L2 influence remains robust despite the instability of the L1 itself. Altogether, our findings speak to the persistence of a bilingual's L1 phonological grammar despite cross-generational L1 attrition.
Collapse
Affiliation(s)
| | - Andries W Coetzee
- University of Michigan, Ann Arbor, MI, USA
- North-West University, Potchefstroom, South Africa
- University of Johannesburg, Johannesburg, South Africa
| | | | | |
Collapse
|
20
|
Tang K, Shaw JA. Prosody leaks into the memories of words. Cognition 2021; 210:104601. [PMID: 33508575 DOI: 10.1016/j.cognition.2021.104601] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 01/11/2021] [Accepted: 01/11/2021] [Indexed: 11/30/2022]
Abstract
The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of probabilistic reduction are stored as part of a word's mental representation. Other research has argued that predictability effects are tied to prosodic structure in integral ways. With the aim of assessing a potential prosodic basis for informativity effects in speech production, this study extends past work in two directions; it investigated informativity effects in another large language, Mandarin Chinese, and broadened the study beyond word duration to additional acoustic dimensions, pitch and intensity, known to index prosodic prominence. The acoustic information of content words was extracted from a large telephone conversation speech corpus with over 400,000 tokens and 6000 word types spoken by 1655 individuals and analyzed for the effect of informativity using frequency statistics estimated from a 431 million word subtitle corpus. Results indicated that words with low informativity have shorter durations, replicating the effect found in English. In addition, informativity had significant effects on maximum pitch and intensity, two phonetic dimensions related to prosodic prominence. Extending this interpretation, these results suggest that predictability is closely linked to prosodic prominence, and that the lexical representation of a word includes phonetic details associated with its average prosodic prominence in discourse. In other words, the lexicon absorbs prosodic influences on speech production.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Linguistics, University of Florida, Gainesville, FL 32611-5454, USA.
| | - Jason A Shaw
- Department of Linguistics, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
21
|
Zhang H, Carlson MT, Diaz MT. Investigating the effects of phonological neighbors on word retrieval and phonetic variation in word naming and picture naming paradigms. LANGUAGE, COGNITION AND NEUROSCIENCE 2019; 35:980-991. [PMID: 33043066 PMCID: PMC7540183 DOI: 10.1080/23273798.2019.1686529] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 10/19/2019] [Indexed: 06/11/2023]
Abstract
Phonological neighbors have been shown to affect word processing. Prior work has shown that when a word with an initial voiceless stop has a contrasting initial voiced stop neighbor, Voice Onset Times (VOTs) are longer. Higher phonological neighborhood density (PND) has also been shown to facilitate word retrieval latency, and be associated with longer VOTs. However, these effects have rarely been investigated with picture naming, which is thought to be a more semantically driven task. The current study examined the effects of phonological neighbors on word retrieval times and phonetic variation, and how these effects differed in word naming and picture naming paradigms. Results showed that PND was positively correlated with longer VOT in both paradigms. Furthermore, the effect of initial stop neighbors on VOTs was only significant in word naming. These results highlight the influence of phonological neighbors on word production in different paradigms, support interactive models of word production, and suggest that hyper-articulation in speech does not solely depend on communicative context.
Collapse
|
22
|
Grigoroglou M, Papafragou A. Children's (and Adults') Production Adjustments to Generic and Particular Listener Needs. Cogn Sci 2019; 43:e12790. [DOI: 10.1111/cogs.12790] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Revised: 08/21/2019] [Accepted: 08/26/2019] [Indexed: 12/01/2022]
Affiliation(s)
- Myrto Grigoroglou
- Department of Linguistics and Cognitive Science University of Delaware
| | - Anna Papafragou
- Department of Psychological and Brain Sciences University of Delaware
| |
Collapse
|
23
|
Dilley L, Gamache J, Wang Y, Houston DM, Bergeson TR. Statistical distributions of consonant variants in infant-directed speech: evidence that /t/ may be exceptional. JOURNAL OF PHONETICS 2019; 75:73-87. [PMID: 32884162 PMCID: PMC7467459 DOI: 10.1016/j.wocn.2019.05.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Statistical distributions of phonetic variants in spoken language influence speech perception for both language learners and mature users. We theorized that patterns of phonetic variant processing of consonants demonstrated by adults might stem in part from patterns of early exposure to statistics of phonetic variants in infant-directed (ID) speech. In particular, we hypothesized that ID speech might involve greater proportions of canonical /t/ pronunciations compared to adult-directed (AD) speech in at least some phonological contexts. This possibility was tested using a corpus of spontaneous speech of mothers speaking to other adults, or to their typically-developing infant. Tokens of word-final alveolar stops - including /t/, /d/, and the nasal stop /n/ - were examined in assimilable contexts (i.e., those followed by a word-initial labial and/or velar); these were classified as canonical, assimilated, deleted, or glottalized. Results confirmed that there were significantly more canonical pronunciations in assimilable contexts in ID compared with AD speech, an effect which was driven by the phoneme /t/. These findings suggest that at least in phonological contexts involving possible assimilation, children are exposed to more canonical /t/ variant pronunciations than adults are. This raises the possibility that perceptual processing of canonical /t/ may be partly attributable to exposure to canonical /t/ variants in ID speech. Results support the need for further research into how statistics of variant pronunciations in early language input may shape speech processing across the lifespan.
Collapse
Affiliation(s)
- Laura Dilley
- Department of Communicative Sciences and Disorders, Michigan State University
| | - Jessica Gamache
- Department of Linguistics and Germanic, Slavic, Asian and African Languages, Michigan State University
| | - Yuanyuan Wang
- Department of Otolaryngology, The Ohio State University
| | | | - Tonya R. Bergeson
- Dept. of Otolaryngology – Head & Neck Surgery, Indiana University School of Medicine
- Department of Communication Sciences and Disorders, Butler University
| |
Collapse
|
24
|
Kurumada C, Grimm S. Predictability of meaning in grammatical encoding: Optional plural marking. Cognition 2019; 191:103953. [PMID: 31234113 DOI: 10.1016/j.cognition.2019.04.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 04/22/2019] [Accepted: 04/23/2019] [Indexed: 10/26/2022]
Abstract
The markedness principle plays a central role in linguistic theory: marked grammatical categories (like plural) tend to receive more linguistic encoding (e.g., morphological marking), while unmarked categories (like singular) tend to receive less linguistic encoding. What precisely makes a grammatical category or meaning marked, however, remains unclear. One prominent proposal attributes markedness to the frequency or predictability of meanings: infrequent or less predictable meanings are more likely to receive extra linguistic encoding than frequent or more predictable meanings. Existing support for the predictability account is limited to correlational evidence, leaving open whether meaning predictability can cause markedness patterns. We present two miniature language learning experiments that directly assess effects of predictability on morphological plural marking. We find that learners preferentially produce plural marking on nouns that are less probable to occur with plural meaning-despite the fact that no such pattern was present in learners' input. This suggests that meaning predictability can cause the markedness patterns like those that are cross-linguistically observed.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Department of Brain and Cognitive Sciences, University of Rochester, United States.
| | - Scott Grimm
- Department of Linguistics, University of Rochester, United States
| |
Collapse
|
25
|
Abstract
Language comprehension requires successfully navigating linguistic variability. One hypothesis for how listeners manage variability is that they rapidly update their expectations of likely linguistic events in new contexts. This process, called adaptation, allows listeners to better predict the upcoming linguistic input. In previous work, Fine, Jaeger, Farmer, and Qian (PLoS ONE, 8, e77661, 2013) found evidence for syntactic adaptation. Subjects repeatedly encountered sentences in which a verb was temporarily ambiguous between main verb (MV) and reduced relative clause (RC) interpretations. They found that subjects who had higher levels of exposure to the unexpected RC interpretation of the sentences had an easier time reading the RC sentences but a more difficult time reading the MV sentences. They concluded that syntactic adaptation occurs rapidly in unexpected structures and also results in difficulty with processing the previously expected alternative structures. This article presents two experiments. Experiment 1 was designed as a follow-up to Fine et al.'s study and failed to find evidence of adaptation. A power analysis of Fine et al.'s raw data revealed that a similar study would need double the items and four times the subjects to reach 95% power. In Experiment 2 we designed a close replication of Fine et al.'s experiment using these sample size guidelines. No evidence of rapid syntactic adaptation was found in this experiment. The failure to find evidence of adaptation in both experiments calls into question the robustness of the effect.
Collapse
|
26
|
Gambi C, Pickering MJ. Sensorimotor communication and language: Comment on "The body talks: Sensorimotor communication and its brain and kinematic signatures" by G. Pezzulo et al. Phys Life Rev 2019; 28:34-35. [PMID: 30738761 DOI: 10.1016/j.plrev.2019.01.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 01/28/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Chiara Gambi
- School of Psychology, 70, Park Place, Cardiff University, CF10 3AT Cardiff, UK.
| | | |
Collapse
|
27
|
Todd S, Pierrehumbert JB, Hay J. Word frequency effects in sound change as a consequence of perceptual asymmetries: An exemplar-based model. Cognition 2019; 185:1-20. [PMID: 30641466 DOI: 10.1016/j.cognition.2019.01.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/31/2018] [Accepted: 01/03/2019] [Indexed: 10/27/2022]
Abstract
Empirically-observed word frequency effects in regular sound change present a puzzle: how can high-frequency words change faster than low-frequency words in some cases, slower in other cases, and at the same rate in yet other cases? We argue that this puzzle can be answered by giving substantial weight to the role of the listener. We present an exemplar-based computational model of regular sound change in which the listener plays a large role, and we demonstrate that it generates sound changes with properties and word frequency effects seen in corpora. In particular, we consider the experimentally-supported assumption that high-frequency words may be more robustly recognized than low-frequency words in the face of acoustic ambiguity. We show that this assumption allows high-frequency words to change at the same rate as low-frequency words when a phoneme category moves without encroaching on the acoustic space of another, faster than low-frequency words when it moves toward another, and slower than low-frequency words when it moves away from another. We discuss how these predicted word frequency effects apply to different types of sound changes that have been observed in the literature. Importantly, these frequency effects follow from assumptions regarding processes in perception, not production. Frequency-based asymmetries in perception predict different frequency effects for different kinds of sound change.
Collapse
Affiliation(s)
- Simon Todd
- Department of Linguistics, Stanford University, Margaret Jacks Hall, Building 460, Stanford, CA 94305-2150, United States.
| | - Janet B Pierrehumbert
- Oxford e-Research Centre, University of Oxford, 7 Keble Road, Oxford OX1 3QG, United Kingdom; New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| | - Jennifer Hay
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Private Bag 4800, Christchurch, New Zealand; Department of Linguistics, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| |
Collapse
|
28
|
Buz E, Buchwald A, Fuchs T, Keshet J. Assessing automatic VOT annotation using unimpaired and impaired speech. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2018; 20:624-634. [PMID: 31274358 DOI: 10.1080/17549507.2018.1490817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Revised: 05/18/2018] [Accepted: 06/15/2018] [Indexed: 06/09/2023]
Abstract
Investigating speech processes often involves analysing data gathered by phonetically annotating speech recordings. Yet, the manual annotation of speech can often be resource intensive-requiring substantial time and labour to complete. Recent advances in automatic annotation methods offer a way to reduce these annotation costs by replacing manual annotation. For researchers and clinicians, the viability of automatic methods depends whether one can draw similar conclusions about speech processes from automatically annotated speech as one would from manually annotated speech. Here, we evaluate how well one automatic annotation tool, AutoVOT, can approximate manual annotation. We do so by comparing analyses of automatically and manually annotated speech in two studies. We find that, with some caveats, we are able to draw the same conclusions about speech processes under both annotation methods. The findings suggest that automatic methods may be a viable way to reduce phonetic annotation costs in the right circumstances. We end with some guidelines on if and how well AutoVOT may be able to replace manual annotation in other data sets.
Collapse
Affiliation(s)
- Esteban Buz
- a Department of Psychology and Program in Linguistics , Princeton University , Princeton, New Jersey , USA
| | - Adam Buchwald
- b Communicative Sciences and Disorders , New York University , New York , USA , and
| | - Tzeviya Fuchs
- c Department of Computer Science , Bar-Ilan University , Ramat Gan , Israel
| | - Joseph Keshet
- c Department of Computer Science , Bar-Ilan University , Ramat Gan , Israel
| |
Collapse
|
29
|
Mielke J, Nielsen K. Voice Onset Time in English voiceless stops is affected by following postvocalic liquids and voiceless onsets. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2166. [PMID: 30404471 DOI: 10.1121/1.5059493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 09/20/2018] [Indexed: 06/08/2023]
Abstract
Voice Onset Time is an important characteristic of stop consonants that plays a large role in perceptual discrimination in many languages, and is widely used in phonetic research. The current paper aims to account for Voice Onset Time variation in English that has defied previously understood phonetic and lexical factors, particularly involving stops that are followed in the word by liquids and voiceless obstruents. 122 Canadian English speakers produced 120 /p/- and /k/-initial words (n = 17 533), and word-initial Voice Onset Time was analyzed. It was found that Voice Onset Time is shorter when the following syllable starts with a voiceless obstruent, and that this effect is mediated by speech rate. Voice Onset Time is also longer before postvocalic liquids, even when they are intervocalic. Voice Onset Time generally decreases through the course of the task, and speakers tend to drift during the course of a word reading task, and this is best accounted for by the residual Voice Onset Time of recently spoken words.
Collapse
Affiliation(s)
- Jeff Mielke
- Department of English, North Carolina State University, Campus Box 8105, Raleigh, North Carolina 27695-8105, USA
| | - Kuniko Nielsen
- Linguistics Department, Oakland University, 433 Meadow Brook Road, Rochester, Michigan 48309-4452, USA
| |
Collapse
|
30
|
Abstract
Audience design refers to the situation in which speakers fashion their utterances so as to cater to the needs of their addressees. In this article, a range of audience design effects are reviewed, organized by a novel cognitive framework for understanding audience design effects. Within this framework, feedforward (or one-shot) production is responsible for feedforward audience design effects, or effects based on already known properties of the addressee (e.g., child versus adult status) or the message (e.g., that it includes meanings that might be confusable). Then, a forward modeling approach is described, whereby speakers independently generate communicatively relevant features to predict potential communicative effects. This can explain recurrent processing audience design effects, or effects based on features of the produced utterance itself or on idiosyncratic features of the addressee or communicative situation. Predictions from the framework are delineated.
Collapse
Affiliation(s)
- Victor S Ferreira
- Department of Psychology and Center for Research in Language, University of California, San Diego, La Jolla, California 92093, USA;
| |
Collapse
|
31
|
Vaughn C, Baese-Berk M, Idemaru K. Re-Examining Phonetic Variability in Native and Non-Native Speech. PHONETICA 2018; 76:327-358. [PMID: 30086539 DOI: 10.1159/000487269] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2017] [Accepted: 01/20/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND/AIMS Non-native speech is frequently characterized as being more variable than native speech. However, the few studies that have directly investigated phonetic variability in the speech of second language learners have considered a limited subset of native/non-native language pairings and few linguistic features. METHODS The present study examines group-level withinspeaker variability and central tendencies in acoustic properties of vowels andstops produced by learners of Japanese from two native language backgrounds, English and Mandarin, as well as native Japanese speakers. RESULTS Results show that non-native speakers do not always exhibit more phonetic variability than native speakers, but rather that patterns of variability are specific to individual linguistic features and their instantiations in L1 and L2. CONCLUSION Adopting this more nuanced approach to variability offers important enhancements to several areas of linguistic theory.
Collapse
|
32
|
Variation in the speech signal as a window into the cognitive architecture of language production. Psychon Bull Rev 2018; 25:1973-2004. [PMID: 29383571 DOI: 10.3758/s13423-017-1423-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The pronunciation of words is highly variable. This variation provides crucial information about the cognitive architecture of the language production system. This review summarizes key empirical findings about variation phenomena, integrating corpus, acoustic, articulatory, and chronometric data from phonetic and psycholinguistic studies. It examines how these data constrain our current understanding of word production processes and highlights major challenges and open issues that should be addressed in future research.
Collapse
|
33
|
Alsius A, Mitsuya T, Latif N, Munhall KG. Linguistic initiation signals increase auditory feedback error correction. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:838. [PMID: 28863596 DOI: 10.1121/1.4997193] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Previous research has shown that speakers can adapt their speech in a flexible manner as a function of a variety of contextual and task factors. While it is known that speech tasks may play a role in speech motor behavior, it remains to be explored if the manner in which the speaking action is initiated can modify low-level, automatic control of vocal motor action. In this study, the nature (linguistic vs non-linguistic) and modality (auditory vs visual) of the go signal (i.e., the prompts) was manipulated in an otherwise identical vocal production task. Participants were instructed to produce the word "head" when prompted, and the auditory feedback they were receiving was altered by systematically changing the first formants of the vowel /ε/ in real time using a custom signal processing system. Linguistic prompts induced greater corrective behaviors to the acoustic perturbations than non-linguistic prompts. This suggests that the accepted variance for the intended speech sound decreases when external linguistic templates are provided to the speaker. Overall, this result shows that the automatic correction of vocal errors is influenced by flexible, context-dependant mechanisms.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Kingston, Ontario, Canada
| | - Takashi Mitsuya
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Nida Latif
- Psychology Department, Queen's University, Kingston, Ontario, Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
34
|
Fink A, Oppenheim GM, Goldrick M. Interactions between Lexical Access and Articulation. LANGUAGE, COGNITION AND NEUROSCIENCE 2017; 33:12-24. [PMID: 29399594 PMCID: PMC5793891 DOI: 10.1080/23273798.2017.1348529] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Accepted: 06/23/2017] [Indexed: 05/31/2023]
Abstract
This study investigates the interaction of lexical access and articulation in spoken word production, examining two dimensions along which theories vary. First, does articulatory variation reflect a fixed plan, or do lexical access-articulatory interactions continue after response initiation? Second, to what extent are interactive mechanisms hard-wired properties of the production system, as opposed to flexible? In two picture-naming experiments, we used semantic neighbor manipulations to induce lexical and conceptual co-activation. Our results provide evidence for multiple sources of interaction, both before and after response initiation. While interactive effects can vary across participants, we do not find strong evidence of variation of effects within individuals, suggesting that these interactions are relatively fixed features of each individual's production system.
Collapse
Affiliation(s)
- Angela Fink
- Northwestern University, Department of Linguistics, Northwestern University, 2016 Sheridan Rd., Evanston, IL 60626
| | - Gary M Oppenheim
- Bangor University, School of Psychology, Adeilad Brigantia, Penrallt Road, Bangor, Gwynedd LL57 2AS, UK
- Rice University, Department of Psychology, Houston, TX 77251
- University of California San Diego, Center for Research in Language, 9500 Gilman Dr, La Jolla, CA 92037
| | - Matthew Goldrick
- Northwestern University, Department of Linguistics, Northwestern University, 2016 Sheridan Rd., Evanston, IL 60626
| |
Collapse
|
35
|
Tuomainen O, Hazan V, Romeo R. Do talkers produce less dispersed phoneme categories in a clear speaking style? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:EL320. [PMID: 27794323 DOI: 10.1121/1.4964815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This study investigated whether adaptations made in clear speaking styles result in more discriminable phonetic categories than in a casual style. Multiple iterations of keywords with word-initial /s/-/ʃ/ were obtained from 40 adults in casual and clear speech via picture description. For centroids, cross-category distance increased in clear speech but with no change in within-category dispersion and no effect on discriminability. However, talkers produced fewer tokens with centroids in the ambiguous region for the /s/-/ʃ/ distinction. These results suggest that, whereas interlocutor feedback regarding communicative success may promote greater segmental adaptations, it is not necessary for some adaptation to occur.
Collapse
Affiliation(s)
- Outi Tuomainen
- Department of Speech, Hearing and Phonetic Sciences, University College London (UCL), Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom ,
| | - Valerie Hazan
- Department of Speech, Hearing and Phonetic Sciences, University College London (UCL), Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom ,
| | - Rachel Romeo
- Speech & Hearing Bioscience and Technology, Division of Medical Sciences, Harvard University, TMEC 435, 260 Longwood Avenue, Boston, Massachusetts 02115, USA
| |
Collapse
|
36
|
Seyfarth S, Buz E, Jaeger TF. Dynamic hyperarticulation of coda voicing contrasts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:EL31-EL37. [PMID: 26936581 PMCID: PMC5392061 DOI: 10.1121/1.4942544] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Revised: 12/29/2015] [Accepted: 02/10/2016] [Indexed: 06/01/2023]
Abstract
This study investigates the capacity for targeted hyperarticulation of contextually-relevant contrasts. Participants communicated target words with final /s/ or /z/ when a voicing minimal-pair (e.g., target dose, minimal-pair doze) either was or was not available as an alternative in the context. The results indicate that talkers enhance the durational cues associated with the word-final voicing contrast based on whether the context requires it, and that this can involve both elongation as well as shortening, depending on what enhances the contextually-relevant contrast. This suggests that talkers are capable of targeted, context-sensitive temporal enhancements.
Collapse
Affiliation(s)
- Scott Seyfarth
- Department of Linguistics, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0108, USA
| | - Esteban Buz
- Department of Brain and Cognitive Sciences, University of Rochester, Meliora Hall, RC 270268, Rochester, New York 14627, USA ,
| | - T Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester, Meliora Hall, RC 270268, Rochester, New York 14627, USA ,
| |
Collapse
|