1
|
Computational Modeling of the Segmentation of Sentence Stimuli From an Infant Word-Finding Study. Cogn Sci 2024; 48:e13427. [PMID: 38528789 DOI: 10.1111/cogs.13427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/22/2024] [Accepted: 02/24/2024] [Indexed: 03/27/2024]
Abstract
Computational models of infant word-finding typically operate over transcriptions of infant-directed speech corpora. It is now possible to test models of word segmentation on speech materials, rather than transcriptions of speech. We propose that such modeling efforts be conducted over the speech of the experimental stimuli used in studies measuring infants' capacity for learning from spoken sentences. Correspondence with infant outcomes in such experiments is an appropriate benchmark for models of infants. We demonstrate such an analysis by applying the DP-Parser model of Algayres and colleagues to auditory stimuli used in infant psycholinguistic experiments by Pelucchi and colleagues. The DP-Parser model takes speech as input, and creates multiple overlapping embeddings from each utterance. Prospective words are identified as clusters of similar embedded segments. This allows segmentation of each utterance into possible words, using a dynamic programming method that maximizes the frequency of constituent segments. We show that DP-Parse mimics American English learners' performance in extracting words from Italian sentences, favoring the segmentation of words with high syllabic transitional probability. This kind of computational analysis over actual stimuli from infant experiments may be helpful in tuning future models to match human performance.
Collapse
|
2
|
Segmenting Speech by Mouth: The Role of Oral Prosodic Cues for Visual Speech Segmentation. LANGUAGE AND SPEECH 2023; 66:819-832. [PMID: 36448317 DOI: 10.1177/00238309221137607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Adults are able to use visual prosodic cues in the speaker's face to segment speech. Furthermore, eye-tracking data suggest that learners will shift their gaze to the mouth during visual speech segmentation. Although these findings suggest that the mouth may be viewed more than the eyes or nose during visual speech segmentation, no study has examined the direct functional importance of individual features; thus, it is unclear which visual prosodic cues are important for word segmentation. In this study, we examined the impact of first removing (Experiment 1) and then isolating (Experiment 2) individual facial features on visual speech segmentation. Segmentation performance was above chance in all conditions except for when the visual display was restricted to the eye region (eyes only condition in Experiment 2). This suggests that participants were able to segment speech when they could visually access the mouth but not when the mouth was completely removed from the visual display, providing evidence that visual prosodic cues conveyed by the mouth are sufficient and likely necessary for visual speech segmentation.
Collapse
|
3
|
Jordanian EFL Students' Perception of Noncontrastive Allophonic Cues in English Speech Segmentation. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2023; 52:1455-1469. [PMID: 37074538 DOI: 10.1007/s10936-023-09944-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/19/2023] [Indexed: 05/03/2023]
Abstract
The prominent role of allophonic cues in English speech segmentation has widely been recognized by phonologists and psycholinguists. However, very meager inquiry was devoted to analysing the perception of these noncontrastive allophonic cues by Arab EFL learners. Accordingly, the present study is an attempt to examine the exploitation of allophonic cues, mainly aspiration, glottalization and approximant devoicing to English word junctures by 40 Jordanian PhD students. Moreover, it aims to find out which allophonic cues are perceived more accurately during the segmentation process and if there is any evidence for Universal Grammar markedness. The experiment is led through a forced-choice identification task adopted from Altenberg (Second Lang Res 21:325-358, 2005) and Rojczyk et al. (Res Lang 1:15-29, 2016). The results of ANOVA unveiled that there is a statistically significant difference between the three types of allophonic cues, viz. aspiration, glottalization and approximant devoicing. This implies that the participants outperformed in stimuli marked by glottalization than by aspiration and approximant devoicing. This result provided further evidence for the universality of glottalization as a boundary cue in English speech segmentation. Overall, the Jordanian PhD students failed in perceiving the allophonic cues accurately and exploiting them to detect word boundaries. The present inquiry has the potential to provide several recommendations for syllabus designers, and second/foreign language teachers and learners.
Collapse
|
4
|
A generic optimization and learning framework for Parkinson disease via speech and handwritten records. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING 2022; 14:1-21. [PMID: 36042792 PMCID: PMC9411848 DOI: 10.1007/s12652-022-04342-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 07/11/2022] [Indexed: 06/15/2023]
Abstract
Parkinson's disease (PD) is a neurodegenerative disorder with slow progression whose symptoms can be identified at late stages. Early diagnosis and treatment of PD can help to relieve the symptoms and delay progression. However, this is very challenging due to the similarities between the symptoms of PD and other diseases. The current study proposes a generic framework for the diagnosis of PD using handwritten images and (or) speech signals. For the handwriting images, 8 pre-trained convolutional neural networks (CNN) via transfer learning tuned by Aquila Optimizer were trained on the NewHandPD dataset to diagnose PD. For the speech signals, features from the MDVR-KCL dataset are extracted numerically using 16 feature extraction algorithms and fed to 4 different machine learning algorithms tuned by Grid Search algorithm, and graphically using 5 different techniques and fed to the 8 pretrained CNN structures. The authors propose a new technique in extracting the features from the voice dataset based on the segmentation of variable speech-signal-segment-durations, i.e., the use of different durations in the segmentation phase. Using the proposed technique, 5 datasets with 281 numerical features are generated. Results from different experiments are collected and recorded. For the NewHandPD dataset, the best-reported metric is 99.75% using the VGG19 structure. For the MDVR-KCL dataset, the best-reported metrics are 99.94% using the KNN and SVM ML algorithms and the combined numerical features; and 100% using the combined the mel-specgram graphical features and VGG19 structure. These results are better than other state-of-the-art researches.
Collapse
|
5
|
Listen-and-repeat training in the learning of non-native consonant duration contrasts: influence of consonant type as reflected by MMN and behavioral methods. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2022; 51:885-901. [PMID: 35312934 PMCID: PMC9338006 DOI: 10.1007/s10936-022-09868-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 11/08/2021] [Indexed: 06/14/2023]
Abstract
Phonological duration differences in quantity languages can be problematic for second language learners whose native language does not use duration contrastively. Recent studies have found improvement in the processing of non-native vowel duration contrasts with the use of listen-and-repeat training, and the current study explores the efficacy of similar methodology on consonant duration contrasts. 18 adult participants underwent two days of listen-and-repeat training with pseudoword stimuli containing either a sibilant or a stop consonant contrast. The results were examined with psychophysiological event-related potentials (mismatch negativity and P3), behavioral discrimination tests and a production task. The results revealed no training-related effects in the event-related potentials or the production task, but behavioral discrimination performance improved. Furthermore, differences emerged between the processing of the two consonant types. The findings suggest that stop consonants are processed more slowly than the sibilants, and the findings are discussed with regard to possible segmentation difficulties.
Collapse
|
6
|
When the "Tabula" is Anything but "Rasa:" What Determines Performance in the Auditory Statistical Learning Task? Cogn Sci 2022; 46:e13102. [PMID: 35122322 PMCID: PMC9285054 DOI: 10.1111/cogs.13102] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 01/09/2022] [Accepted: 01/11/2022] [Indexed: 11/28/2022]
Abstract
How does prior linguistic knowledge modulate learning in verbal auditory statistical learning (SL) tasks? Here, we address this question by assessing to what extent the frequency of syllabic co‐occurrences in the learners’ native language determines SL performance. We computed the frequency of co‐occurrences of syllables in spoken Spanish through a transliterated corpus, and used this measure to construct two artificial familiarization streams. One stream was constructed by embedding pseudowords with high co‐occurrence frequency in Spanish (“Spanish‐like” condition), the other by embedding pseudowords with low co‐occurrence frequency (“Spanish‐unlike” condition). Native Spanish‐speaking participants listened to one of the two streams, and were tested in an old/new identification task to examine their ability to discriminate the embedded pseudowords from foils. Our results show that performance in the verbal auditory SL (ASL) task was significantly influenced by the frequency of syllabic co‐occurrences in Spanish: When the embedded pseudowords were more “Spanish‐like,” participants were better able to identify them as part of the stream. These findings demonstrate that learners’ task performance in verbal ASL tasks changes as a function of the artificial language's similarity to their native language, and highlight how linguistic prior knowledge biases the learning of regularities.
Collapse
|
7
|
Oscillatory activity and EEG phase synchrony of concurrent word segmentation and meaning-mapping in 9-year-old children. Dev Cogn Neurosci 2021; 51:101010. [PMID: 34461393 PMCID: PMC8403737 DOI: 10.1016/j.dcn.2021.101010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 10/28/2022] Open
Abstract
When learning a new language, one must segment words from continuous speech and associate them with meanings. These complex processes can be boosted by attentional mechanisms triggered by multi-sensory information. Previous electrophysiological studies suggest that brain oscillations are sensitive to different hierarchical complexity levels of the input, making them a plausible neural substrate for speech parsing. Here, we investigated the functional role of brain oscillations during concurrent speech segmentation and meaning acquisition in sixty 9-year-old children. We collected EEG data during an audio-visual statistical learning task during which children were exposed to a learning condition with consistent word-picture associations and a random condition with inconsistent word-picture associations before being tested on their ability to recall words and word-picture associations. We capitalized on the brain dynamics to align neural activity to the same rate as an external rhythmic stimulus to explore modulations of neural synchronization and phase synchronization between electrodes during multi-sensory word learning. Results showed enhanced power at both word- and syllabic-rate and increased EEG phase synchronization between frontal and occipital regions in the learning compared to the random condition. These findings suggest that multi-sensory cueing and attentional mechanisms play an essential role in children's successful word learning.
Collapse
|
8
|
Brain electrical dynamics in speech segmentation depends upon prior experience with the language. BRAIN AND LANGUAGE 2021; 219:104967. [PMID: 34022679 DOI: 10.1016/j.bandl.2021.104967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 04/26/2021] [Accepted: 05/10/2021] [Indexed: 06/12/2023]
Abstract
It remains unclear whether the process of speech tracking, which facilitates speech segmentation, reflects top-down mechanisms related to prior linguistic models or stimulus-driven mechanisms, or possibly both. To address this, we recorded electroencephalography (EEG) responses from native and non-native speakers of English that had different prior experience with the English language but heard acoustically identical stimuli. Despite a significant difference in the ability to segment and perceive speech, our EEG results showed that theta-band tracking of the speech envelope did not depend significantly on prior experience with language. However, tracking in the theta-band did show changes across repetitions of the same sentence, suggesting a priming effect. Furthermore, native and non-native speakers showed different phase dynamics at word boundaries, suggesting differences in segmentation mechanisms. Finally, we found that the correlation between higher frequency dynamics reflecting phoneme-level processing and perceptual segmentation of words might depend on prior experience with the spoken language.
Collapse
|
9
|
The Language-specific Use of Fundamental Frequency Rise in Segmentation of an Artificial Language: Evidence from Listeners of Taiwanese Southern Min. LANGUAGE AND SPEECH 2021; 64:437-466. [PMID: 34110259 DOI: 10.1177/0023830919886604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Experience with native-language prosody encourages language-specific strategies for speech segmentation. Conflicting findings from previous research suggest that these strategies may not be abstracted away from the acoustic manifestation of prosodic features in the native speech. Using the artificial language learning paradigm, the current study explores this possibility in connection with listeners of a lexical tone language called Taiwanese Southern Min (TSM). In TSM, the only rising lexical tone occurs almost only on the final syllable of the language's tone sandhi domain and is phonetically associated with final lengthening. Based on these observations, Experiment I examined what constituted a sufficient finality cue for use by TSM listeners to support segmentation: (a) final fundamental frequency (F0) rise only; or (b) final F0 rise conjoined with final lengthening. The results showed that segmentation was inhibited by the former cue but facilitated by the latter. Experiment II showed that the facilitation cannot be attributed entirely to final lengthening, as a null effect was found when final lengthening was the sole prosodic cue to segmentation. It is thus assumed that acoustic details as fine-grained as the lengthening of the rising tone are involved in the modulation of the segmentation strategy whereby TSM listeners perceive F0 rise as signaling finality. The inhibitory effect of final F0 rise alone found in Experiment I motivated Experiment III, which revealed that initial F0 rise in the absence of lengthening cues improved TSM listeners' segmentation. It is speculated that such use of initial F0 rise might reflect a cross-linguistic segmentation solution.
Collapse
|
10
|
When statistics collide: The use of transitional and phonotactic probability cues to word boundaries. Mem Cognit 2021; 49:1300-1310. [PMID: 33751490 DOI: 10.3758/s13421-021-01163-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/01/2021] [Indexed: 11/08/2022]
Abstract
Statistical regularities in linguistic input, such as transitional probability and phonotactic probability, have been shown to promote speech segmentation. It remains unclear, however, whether or how the combination of transitional probabilities and subtle phonotactic probabilities influence segmentation. The present study provides a fine-grained investigation of the effects of such combined statistics. Adults (N = 81) were tested in one of two conditions. In the Anchor condition, they heard a continuous stream of words with small differences in phonotactic probabilities. In the Uniform condition, all words had comparable phonotactic probabilities. In both conditions, transitional probability was stronger in words than in part-words. Only participants from the Anchor condition preferred words at test, indicating that the combination of transitional probabilities and subtle phonotactic probabilities may facilitate speech segmentation. We discuss the methodological implications of our findings, which demonstrate that even small phonotactic variations should be accounted for when investigating statistical speech segmentation.
Collapse
|
11
|
Evidence of ordinal position encoding of sequences extracted from continuous speech. Cognition 2021; 213:104646. [PMID: 33707004 DOI: 10.1016/j.cognition.2021.104646] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/11/2020] [Accepted: 02/23/2021] [Indexed: 10/22/2022]
Abstract
Infants' capacity to extract statistical regularities from sequential information is impressive and well documented. However, statistical learning's underlying mechanism remains mostly unknown, and its role in language acquisition is still under debate. To shed light on these issues, here we address the question of which information human subjects extract and encode after familiarisation with a continuous sequence of stimuli and its dependence on the type of segmentation cues and on the stimuli modality. Specifically, we investigate whether adults and 5-month-old infants learn the syllables' co-occurrence in the stream or generate a representation of the Words that include syllables' ordinal position. We test if subtle pauses signalling word boundaries change the encoding and, in adults, if it varies across modalities. In six behavioural experiments, we show that: (i) Adults and infants learn the streams' statistical structure. (ii) Ordinal encoding emerges in the auditory modality, and pauses enhanced it. However, (iii) ordinal encoding seems to depend on the learning stage and not on pauses marking Words' edges. Interestingly, (iv) for visual presentation of orthographic syllables, we do not find evidence of ordinal encoding in adults. Our results support the emergence, in the auditory modality, of a Word representation where its constituents are associated with an ordinal position at play already early in life, bringing new insights into speech processing and language acquisition. Additionally, we successfully use for the first time pupillometry in an infant segmentation task.
Collapse
|
12
|
Non-adjacent dependency learning in infancy, and its link to language development. Cogn Psychol 2020; 120:101291. [PMID: 32197131 DOI: 10.1016/j.cogpsych.2020.101291] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 02/20/2020] [Accepted: 03/05/2020] [Indexed: 11/25/2022]
Abstract
To acquire language, infants must learn how to identify words and linguistic structure in speech. Statistical learning has been suggested to assist both of these tasks. However, infants' capacity to use statistics to discover words and structure together remains unclear. Further, it is not yet known how infants' statistical learning ability relates to their language development. We trained 17-month-old infants on an artificial language comprising non-adjacent dependencies, and examined their looking times on tasks assessing sensitivity to words and structure using an eye-tracked head-turn-preference paradigm. We measured infants' vocabulary size using a Communicative Development Inventory (CDI) concurrently and at 19, 21, 24, 25, 27, and 30 months to relate performance to language development. Infants could segment the words from speech, demonstrated by a significant difference in looking times to words versus part-words. Infants' segmentation performance was significantly related to their vocabulary size (receptive and expressive) both currently, and over time (receptive until 24 months, expressive until 30 months), but was not related to the rate of vocabulary growth. The data also suggest infants may have developed sensitivity to generalised structure, indicating similar statistical learning mechanisms may contribute to the discovery of words and structure in speech, but this was not related to vocabulary size.
Collapse
|
13
|
Speech fine structure contains critical temporal cues to support speech segmentation. Neuroimage 2019; 202:116152. [PMID: 31484039 DOI: 10.1016/j.neuroimage.2019.116152] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 08/10/2019] [Accepted: 08/31/2019] [Indexed: 11/16/2022] Open
Abstract
Segmenting the continuous speech stream into units for further perceptual and linguistic analyses is fundamental to speech recognition. The speech amplitude envelope (SE) has long been considered a fundamental temporal cue for segmenting speech. Does the temporal fine structure (TFS), a significant part of speech signals often considered to contain primarily spectral information, contribute to speech segmentation? Using magnetoencephalography, we show that the TFS entrains cortical responses between 3 and 6 Hz and demonstrate, using mutual information analysis, that (i) the temporal information in the TFS can be reconstructed from a measure of frame-to-frame spectral change and correlates with the SE and (ii) that spectral resolution is key to the extraction of such temporal information. Furthermore, we show behavioural evidence that, when the SE is temporally distorted, the TFS provides cues for speech segmentation and aids speech recognition significantly. Our findings show that it is insufficient to investigate solely the SE to understand temporal speech segmentation, as the SE and the TFS derived from a band-filtering method convey comparable, if not inseparable, temporal information. We argue for a more synthetic view of speech segmentation - the auditory system groups speech signals coherently in both temporal and spectral domains.
Collapse
|
14
|
Abstract
Background: Stroke may cause sentence comprehension disorders. Speech segmentation, i.e. the ability to detect word boundaries while listening to continuous speech, is an initial step allowing the successful identification of words and the accurate understanding of meaning within sentences. It has received little attention in people with post-stroke aphasia (PWA).Objectives: Our goal was to study speech segmentation in PWA and examine the potential benefit of seeing the speakers' articulatory gestures while segmenting sentences.Methods: Fourteen PWA and twelve healthy controls participated in this pilot study. Performance was measured with a word-monitoring task. In the auditory-only modality, participants were presented with auditory-only stimuli while in the audiovisual modality, visual speech cues (i.e. speaker's articulatory gestures) accompanied the auditory input. The proportion of correct responses was calculated for each participant and each modality. Visual enhancement was then calculated in order to estimate the potential benefit of seeing the speaker's articulatory gestures.Results: Both in auditory-only and audiovisual modalities, PWA performed significantly less well than controls, who had 100% correct performance in both modalities. The performance of PWA was correlated with their phonological ability. Six PWA used the visual cues. Group level analysis performed on PWA did not show any reliable difference between the auditory-only and audiovisual modalities (median of visual enhancement = 7% [Q1 - Q3: -5 - 39]).Conclusion: Our findings show that speech segmentation disorder may exist in PWA. This points to the importance of assessing and training speech segmentation after stroke. Further studies should investigate the characteristics of PWA who use visual speech cues during sentence processing.
Collapse
|
15
|
Statistical learning of speech regularities can occur outside the focus of attention. Cortex 2019; 115:56-71. [PMID: 30771622 DOI: 10.1016/j.cortex.2019.01.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 08/06/2018] [Accepted: 01/10/2019] [Indexed: 11/19/2022]
Abstract
Statistical learning, the process of extracting regularities from the environment, plays an essential role in many aspects of cognition, including speech segmentation and language acquisition. A key component of statistical learning in a linguistic context is the perceptual binding of adjacent individual units (e.g., syllables) into integrated composites (e.g., multisyllabic words). A second, conceptually dissociable component of statistical learning is the memory storage of these integrated representations. Here we examine whether these two dissociable components of statistical learning are differentially impacted by top-down, voluntary attentional resources. Learners' attention was either focused towards or diverted from a speech stream made up of repeating nonsense words. Building on our previous findings, we quantified the online perceptual binding of individual syllables into component words using an EEG-based neural entrainment measure. Following exposure, statistical learning was assessed using offline tests, sensitive to both perceptual binding and memory storage. Neural measures verified that our manipulation of selective attention successfully reduced limited-capacity resources to the speech stream. Diverting attention away from the speech stream did not alter neural entrainment to the component words or post-exposure familiarity ratings, but did impact performance on an indirect reaction-time based memory test. We conclude that theoretically dissociable components of statistically learning are differentially impacted by attention and top-down processing resources. A reduction in attention to the speech stream may impede memory storage of the component words. In contrast, the moment-by-moment perceptual binding of speech regularities can occur even while learners' attention is focused on a demanding concurrent task, and we found no evidence that selective attention modulates this process. These results suggest that learners can acquire basic statistical properties of language without directly focusing on the speech input, potentially opening up previously overlooked opportunities for language learning, particularly in adult learners.
Collapse
|
16
|
Syntactic Cues Take Precedence Over Distributional Cues in Native and Non-Native Speech Segmentation. LANGUAGE AND SPEECH 2018; 61:615-631. [PMID: 30249146 DOI: 10.1177/0023830918801392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study investigates whether syntactic cues take precedence over distributional cues in native and non-native speech segmentation by examining native and non-native speech segmentation in potential French-liaison contexts. Native French listeners and English-speaking second-language learners of French completed a visual-world eye-tracking experiment. Half the stimuli contained the pivotal consonant /t/, a frequent word onset but infrequent liaison consonant, and half contained /z/, a frequent liaison consonant but rare word onset. In the adjective-noun condition (permitting liaison), participants heard a consonant-initial target (e.g., le petit tatoué; le fameux zélé) that was temporarily ambiguous at the segmental level with a vowel-initial competitor (e.g., le petit [t]athée; le fameux [z]élu); in the noun-adjective condition (not permitting liaison), they heard a consonant-initial target (e.g., le client tatoué; le Français zélé) that was not temporarily ambiguous with a vowel-initial competitor (e.g., le client [*t]athée; le Français [*z]élu). Growth-curve analyses revealed that syntactic context modulated both groups' fixations (noun-adjective > adjective-noun), and pivotal consonant modulated both groups' fixations (/t/ > /z/) only in the adjective-noun condition, with the effect of the consonant decreasing in more proficient French learners. These results suggest that syntactic cues override distributional cues in the segmentation of French words in potential liaison contexts.
Collapse
|
17
|
Abstract
Phonemes play a central role in traditional theories as units of speech perception and access codes to lexical representations. Phonemes have two essential properties: they are 'segment-sized' (the size of a consonant or vowel) and abstract (a single phoneme may be have different acoustic realisations). Nevertheless, there is a long history of challenging the phoneme hypothesis, with some theorists arguing for differently sized phonological units (e.g. features or syllables) and others rejecting abstract codes in favour of representations that encode detailed acoustic properties of the stimulus. The phoneme hypothesis is the minority view today. We defend the phoneme hypothesis in two complementary ways. First, we show that rejection of phonemes is based on a flawed interpretation of empirical findings. For example, it is commonly argued that the failure to find acoustic invariances for phonemes rules out phonemes. However, the lack of invariance is only a problem on the assumption that speech perception is a bottom-up process. If learned sublexical codes are modified by top-down constraints (which they are), then this argument loses all force. Second, we provide strong positive evidence for phonemes on the basis of linguistic data. Almost all findings that are taken (incorrectly) as evidence against phonemes are based on psycholinguistic studies of single words. However, phonemes were first introduced in linguistics, and the best evidence for phonemes comes from linguistic analyses of complex word forms and sentences. In short, the rejection of phonemes is based on a false analysis and a too-narrow consideration of the relevant data.
Collapse
|
18
|
Pre-linguistic segmentation of speech into syllable-like units. Cognition 2017; 171:130-150. [PMID: 29156241 DOI: 10.1016/j.cognition.2017.11.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 10/23/2017] [Accepted: 11/10/2017] [Indexed: 11/15/2022]
Abstract
Syllables are often considered to be central to infant and adult speech perception. Many theories and behavioral studies on early language acquisition are also based on syllable-level representations of spoken language. There is little clarity, however, on what sort of pre-linguistic "syllable" would actually be accessible to an infant with no phonological or lexical knowledge. Anchored by the notion that syllables are organized around particularly sonorous (audible) speech sounds, the present study investigates the feasibility of speech segmentation into syllable-like chunks without any a priori linguistic knowledge. We first operationalize sonority as a measurable property of the acoustic input, and then use sonority variation across time, or speech rhythm, as the basis for segmentation. The entire process from acoustic input to chunks of syllable-like acoustic segments is implemented as a computational model inspired by the oscillatory entrainment of the brain to speech rhythm. We analyze the output of the segmentation process in three different languages, showing that the sonority fluctuation in speech is highly informative of syllable and word boundaries in all three cases without any language-specific tuning of the model. These findings support the widely held assumption that syllable-like structure is accessible to infants even when they are only beginning to learn the properties of their native language.
Collapse
|
19
|
Data from Russian Help to Determine in Which Languages the Possible Word Constraint Applies. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2017; 46:629-640. [PMID: 27853918 DOI: 10.1007/s10936-016-9458-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The Possible Word Constraint, or PWC, is a speech segmentation principle prohibiting to postulate word boundaries if a remaining segment contains only consonants. The PWC was initially formulated for English where all words contain a vowel and claimed to hold universally after being confirmed for various other languages. However, it is crucial to look at languages that allow for words without vowels. Two such languages have been tested: data from Slovak were compatible with the PWC, while data from Tarifiyt Berber did not support it. We hypothesize that the fixed word stress could influence the results in Slovak and report two word-spotting experiments on Russian, which has similar one-consonant words, but flexible word stress. The results contradict the PWC, so we suggest that it does not operate in the languages where words without vowels are possible, while the results from Slovak might be explained by its prosodic properties.
Collapse
|
20
|
Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning. Neuropsychologia 2016; 98:56-67. [PMID: 27732869 DOI: 10.1016/j.neuropsychologia.2016.10.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 10/03/2016] [Accepted: 10/08/2016] [Indexed: 11/16/2022]
Abstract
Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning.
Collapse
|
21
|
Assessing segmentation processes by click detection: online measure of statistical learning, or simple interference? Behav Res Methods 2016; 47:1393-1403. [PMID: 25515838 DOI: 10.3758/s13428-014-0548-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Statistical learning can be used to extract the words from continuous speech. Gómez, Bion, and Mehler (Language and Cognitive Processes, 26, 212-223, 2011) proposed an online measure of statistical learning: They superimposed auditory clicks on a continuous artificial speech stream made up of a random succession of trisyllabic nonwords. Participants were instructed to detect these clicks, which could be located either within or between words. The results showed that, over the length of exposure, reaction times (RTs) increased more for within-word than for between-word clicks. This result has been accounted for by means of statistical learning of the between-word boundaries. However, even though statistical learning occurs without an intention to learn, it nevertheless requires attentional resources. Therefore, this process could be affected by a concurrent task such as click detection. In the present study, we evaluated the extent to which the click detection task indeed reflects successful statistical learning. Our results suggest that the emergence of RT differences between within- and between-word click detection is neither systematic nor related to the successful segmentation of the artificial language. Therefore, instead of being an online measure of learning, the click detection task seems to interfere with the extraction of statistical regularities.
Collapse
|
22
|
Speech segmentation by statistical learning is supported by domain-general processes within working memory. Q J Exp Psychol (Hove) 2016; 69:2390-2401. [PMID: 27167308 DOI: 10.1080/17470218.2015.1112825] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The purpose of this study was to examine the extent to which working memory resources are recruited during statistical learning (SL). Participants were asked to identify novel words in an artificial speech stream where the transitional probabilities between syllables provided the only segmentation cue. Experiments 1 and 2 demonstrated that segmentation performance improved when the speech rate was slowed down, suggesting that SL is supported by some form of active processing or maintenance mechanism that operates more effectively under slower presentation rates. In Experiment 3 we investigated the nature of this mechanism by asking participants to perform a two-back task while listening to the speech stream. Half of the participants performed a two-back rhyme task designed to engage phonological processing, whereas the other half performed a comparable two-back task on un-nameable visual shapes. It was hypothesized that if SL is dependent only upon domain-specific processes (i.e., phonological rehearsal), the rhyme task should impair speech segmentation performance more than the shape task. However, the two loads were equally disruptive to learning, as they both eradicated the benefit provided by the slow rate. These results suggest that SL is supported by working-memory processes that rely on domain-general resources.
Collapse
|
23
|
Spoken Word Recognition of Chinese Words in Continuous Speech. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2015; 44:775-787. [PMID: 25252732 DOI: 10.1007/s10936-014-9318-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations of syllable boundaries in speech. Two word-spotting experiments were conducted to investigate the role of positional probability in the spoken word recognition process of Cantonese speech. It was found that listeners indeed made use of the positional probability of a syllable's onset but not of a syllable's ending sound in the spoken word recognition process. Together with other relevant studies in different languages, we propose that probabilistic phonotactics are one useful source of information in the spoken word recognition and speech segmentation process.
Collapse
|
24
|
Simultaneous segmentation and generalisation of non-adjacent dependencies from continuous speech. Cognition 2015; 147:70-4. [PMID: 26638049 DOI: 10.1016/j.cognition.2015.11.010] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Revised: 10/23/2015] [Accepted: 11/20/2015] [Indexed: 11/20/2022]
Abstract
Language learning requires mastering multiple tasks, including segmenting speech to identify words, and learning the syntactic role of these words within sentences. A key question in language acquisition research is the extent to which these tasks are sequential or successive, and consequently whether they may be driven by distinct or similar computations. We explored a classic artificial language learning paradigm, where the language structure is defined in terms of non-adjacent dependencies. We show that participants are able to use the same statistical information at the same time to segment continuous speech to both identify words and to generalise over the structure, when the generalisations were over novel speech that the participants had not previously experienced. We suggest that, in the absence of evidence to the contrary, the most economical explanation for the effects is that speech segmentation and grammatical generalisation are dependent on similar statistical processing mechanisms.
Collapse
|