1
|
Blache P. A neuro-cognitive model of comprehension based on prediction and unification. Front Hum Neurosci 2024; 18:1356541. [PMID: 38655372 PMCID: PMC11035797 DOI: 10.3389/fnhum.2024.1356541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open
Abstract
Most architectures and models of language processing have been built upon a restricted view of language, which is limited to sentence processing. These approaches fail to capture one primordial characteristic: efficiency. Many facilitation effects are known to be at play in natural situations such as conversation (shallow processing, no real access to the lexicon, etc.) without any impact on the comprehension. In this study, on the basis of a new model integrating into a unique architecture, we present these facilitation effects for accessing the meaning into the classical compositional architecture. This model relies on two mechanisms, prediction and unification, and provides a unique architecture for the description of language processing in its natural environment.
Collapse
Affiliation(s)
- Philippe Blache
- Laboratoire Parole et Langage (LPL-CNRS), Aix-en-Provence, France
- Institute of Language, Communication and the Brain (ILCB), Marseille, France
| |
Collapse
|
2
|
Pankratz E, Kirby S, Culbertson J. Evaluating the Relative Importance of Wordhood Cues Using Statistical Learning. Cogn Sci 2024; 48:e13429. [PMID: 38497523 DOI: 10.1111/cogs.13429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/22/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024]
Abstract
Identifying wordlike units in language is typically done by applying a battery of criteria, though how to weight these criteria with respect to one another is currently unknown. We address this question by investigating whether certain criteria are also used as cues for learning an artificial language-if they are, then perhaps they can be relied on more as trustworthy top-down diagnostics. The two criteria for grammatical wordhood that we consider are a unit's free mobility and its internal immutability. These criteria also map to two cognitive mechanisms that could underlie successful statistical learning: learners might orient themselves around the low transitional probabilities at unit boundaries, or they might seek chunks with high internal transitional probabilities. We find that each criterion has its own facilitatory effect, and learning is best where they both align. This supports the battery-of-criteria approach to diagnosing wordhood, and also suggests that the mechanism behind statistical learning may not be a question of either/or; perhaps the two mechanisms do not compete, but mutually reinforce one another.
Collapse
Affiliation(s)
- Elizabeth Pankratz
- Centre for Language Evolution, Department of Linguistics and English Language, University of Edinburgh
| | - Simon Kirby
- Centre for Language Evolution, Department of Linguistics and English Language, University of Edinburgh
| | - Jennifer Culbertson
- Centre for Language Evolution, Department of Linguistics and English Language, University of Edinburgh
| |
Collapse
|
3
|
Endress AD. Hebbian learning can explain rhythmic neural entrainment to statistical regularities. Dev Sci 2024:e13487. [PMID: 38372153 DOI: 10.1111/desc.13487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 12/26/2023] [Accepted: 01/29/2024] [Indexed: 02/20/2024]
Abstract
In many domains, learners extract recurring units from continuous sequences. For example, in unknown languages, fluent speech is perceived as a continuous signal. Learners need to extract the underlying words from this continuous signal and then memorize them. One prominent candidate mechanism is statistical learning, whereby learners track how predictive syllables (or other items) are of one another. Syllables within the same word predict each other better than syllables straddling word boundaries. But does statistical learning lead to memories of the underlying words-or just to pairwise associations among syllables? Electrophysiological results provide the strongest evidence for the memory view. Electrophysiological responses can be time-locked to statistical word boundaries (e.g., N400s) and show rhythmic activity with a periodicity of word durations. Here, I reproduce such results with a simple Hebbian network. When exposed to statistically structured syllable sequences (and when the underlying words are not excessively long), the network activation is rhythmic with the periodicity of a word duration and activation maxima on word-final syllables. This is because word-final syllables receive more excitation from earlier syllables with which they are associated than less predictable syllables that occur earlier in words. The network is also sensitive to information whose electrophysiological correlates were used to support the encoding of ordinal positions within words. Hebbian learning can thus explain rhythmic neural activity in statistical learning tasks without any memory representations of words. Learners might thus need to rely on cues beyond statistical associations to learn the words of their native language. RESEARCH HIGHLIGHTS: Statistical learning may be utilized to identify recurring units in continuous sequences (e.g., words in fluent speech) but may not generate explicit memory for words. Exposure to statistically structured sequences leads to rhythmic activity with a period of the duration of the underlying units (e.g., words). I show that a memory-less Hebbian network model can reproduce this rhythmic neural activity as well as putative encodings of ordinal positions observed in earlier research. Direct tests are needed to establish whether statistical learning leads to declarative memories for words.
Collapse
Affiliation(s)
- Ansgar D Endress
- Department of Psychology, City, University of London, London, UK
| |
Collapse
|
4
|
Santolin C, Crespo-Bojorque P, Sebastian-Galles N, Toro JM. Sensitivity to the sonority sequencing principle in rats (Rattus norvegicus). Sci Rep 2023; 13:17036. [PMID: 37813950 PMCID: PMC10562444 DOI: 10.1038/s41598-023-44081-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023] Open
Abstract
Albeit diverse, human languages exhibit universal structures. A salient example is the syllable, an important structure of language acquisition. The structure of syllables is determined by the Sonority Sequencing Principle (SSP), a linguistic constraint according to which phoneme intensity must increase at onset, reaching a peak at nucleus (vowel), and decline at offset. Such structure generates an intensity pattern with an arch shape. In humans, sensitivity to restrictions imposed by the SSP on syllables appears at birth, raising questions about its emergence. We investigated the biological mechanisms at the foundations of the SSP, testing a nonhuman, non-vocal-learner species with the same language materials used with humans. Rats discriminated well-structured syllables (e.g., pras) from ill-structured ones (e.g., lbug) after being familiarized with syllabic structures conforming to the SSP. In contrast, we did not observe evidence that rats familiarized with syllables that violate such constraint discriminated at test. This research provides the first evidence of sensitivity to the SSP in a nonhuman species, which likely stems from evolutionary-ancient cross-species biological predispositions for natural acoustic patterns. Humans' early sensitivity to the SSP possibly emerges from general auditory processing that favors sounds depicting an arch-shaped envelope, common amongst animal vocalizations. Ancient sensory mechanisms, responsible for processing vocalizations in the wild, would constitute an entry-gate for human language acquisition.
Collapse
Affiliation(s)
- Chiara Santolin
- Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.
| | | | | | - Juan Manuel Toro
- Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| |
Collapse
|
5
|
Endress AD, Johnson SP. Hebbian, correlational learning provides a memory-less mechanism for Statistical Learning irrespective of implementational choices: Reply to Tovar and Westermann (2022). Cognition 2023; 230:105290. [PMID: 36240613 DOI: 10.1016/j.cognition.2022.105290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 08/30/2022] [Accepted: 09/17/2022] [Indexed: 11/07/2022]
Abstract
Statistical learning relies on detecting the frequency of co-occurrences of items and has been proposed to be crucial for a variety of learning problems, notably to learn and memorize words from fluent speech. Endress and Johnson (2021) (hereafter EJ) recently showed that such results can be explained based on simple memory-less correlational learning mechanisms such as Hebbian Learning. Tovar and Westermann (2022) (hereafter TW) reproduced these results with a different Hebbian model. We show that the main differences between the models are whether temporal decay acts on both the connection weights and the activations (in TW) or only on the activations (in EJ), and whether interference affects weights (in TW) or activations (in EJ). Given that weights and activations are linked through the Hebbian learning rule, the networks behave similarly. However, in contrast to TW, we do not believe that neurophysiological data are relevant to adjudicate between abstract psychological models with little biological detail. Taken together, both models show that different memory-less correlational learning mechanisms provide a parsimonious account of Statistical Learning results. They are consistent with evidence that Statistical Learning might not allow learners to learn and retain words, and Statistical Learning might support predictive processing instead.
Collapse
Affiliation(s)
| | - Scott P Johnson
- Department of Psychology, University of California, Los Angeles, United States of America
| |
Collapse
|
6
|
Cognitive mechanisms of statistical learning and segmentation of continuous sensory input. Mem Cognit 2021; 50:979-996. [PMID: 34964955 PMCID: PMC9209387 DOI: 10.3758/s13421-021-01264-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2021] [Indexed: 11/19/2022]
Abstract
Two classes of cognitive mechanisms have been proposed to explain segmentation of continuous sensory input into discrete recurrent constituents: clustering and boundary-finding mechanisms. Clustering mechanisms are based on identifying frequently co-occurring elements and merging them together as parts that form a single constituent. Bracketing (or boundary-finding) mechanisms work by identifying rarely co-occurring elements that correspond to the boundaries between discrete constituents. In a series of behavioral experiments, I tested which mechanisms are at play in the visual modality both during segmentation of a continuous syllabic sequence into discrete word-like constituents and during recognition of segmented constituents. Additionally, I explored conscious awareness of the products of statistical learning—whole constituents versus merged clusters of smaller subunits. My results suggest that both online segmentation and offline recognition of extracted constituents rely on detecting frequently co-occurring elements, a process likely based on associative memory. However, people are more aware of having learnt whole tokens than of recurrent composite clusters.
Collapse
|
7
|
Matzinger T, Fitch WT. Voice modulatory cues to structure across languages and species. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200393. [PMID: 34719253 PMCID: PMC8558770 DOI: 10.1098/rstb.2020.0393] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2021] [Indexed: 12/21/2022] Open
Abstract
Voice modulatory cues such as variations in fundamental frequency, duration and pauses are key factors for structuring vocal signals in human speech and vocal communication in other tetrapods. Voice modulation physiology is highly similar in humans and other tetrapods due to shared ancestry and shared functional pressures for efficient communication. This has led to similarly structured vocalizations across humans and other tetrapods. Nonetheless, in their details, structural characteristics may vary across species and languages. Because data concerning voice modulation in non-human tetrapod vocal production and especially perception are relatively scarce compared to human vocal production and perception, this review focuses on voice modulatory cues used for speech segmentation across human languages, highlighting comparative data where available. Cues that are used similarly across many languages may help indicate which cues may result from physiological or basic cognitive constraints, and which cues may be employed more flexibly and are shaped by cultural evolution. This suggests promising candidates for future investigation of cues to structure in non-human tetrapod vocalizations. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Collapse
Affiliation(s)
- Theresa Matzinger
- Department of Behavioral and Cognitive Biology, University of Vienna, 1030 Vienna, Austria
- Department of English, University of Vienna, 1090 Vienna, Austria
| | - W. Tecumseh Fitch
- Department of Behavioral and Cognitive Biology, University of Vienna, 1030 Vienna, Austria
- Department of English, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
8
|
Does morphological complexity affect word segmentation? Evidence from computational modeling. Cognition 2021; 220:104960. [PMID: 34920298 DOI: 10.1016/j.cognition.2021.104960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/01/2021] [Accepted: 11/15/2021] [Indexed: 11/21/2022]
Abstract
How can infants detect where words or morphemes start and end in the continuous stream of speech? Previous computational studies have investigated this question mainly for English, where morpheme and word boundaries are often isomorphic. Yet in many languages, words are often multimorphemic, such that word and morpheme boundaries do not align. Our study employed corpora of two languages that differ in the complexity of inflectional morphology, Chintang (Sino-Tibetan) and Japanese (in Experiment 1), as well as corpora of artificial languages ranging in morphological complexity, as measured by the ratio and distribution of morphemes per word (in Experiments 2 and 3). We used two baselines and three conceptually diverse word segmentation algorithms, two of which rely purely on sublexical information using distributional cues, and one that builds a lexicon. The algorithms' performance was evaluated on both word- and morpheme-level representations of the corpora. Segmentation results were better for the morphologically simpler languages than for the morphologically more complex languages, in line with the hypothesis that languages with greater inflectional complexity could be more difficult to segment into words. We further show that the effect of morphological complexity is relatively small, compared to that of algorithm and evaluation level. We therefore recommend that infant researchers look for signatures of the different segmentation algorithms and strategies, before looking for differences in infant segmentation landmarks across languages varying in complexity.
Collapse
|
9
|
Mateu V, Sundara M. Spanish input accelerates bilingual infants' segmentation of English words. Cognition 2021; 218:104936. [PMID: 34678682 DOI: 10.1016/j.cognition.2021.104936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 10/06/2021] [Accepted: 10/09/2021] [Indexed: 11/03/2022]
Abstract
We asked whether increased exposure to iambs, two-syllable words with stress on the second syllable (e.g., guitar), by way of another language - Spanish - facilitates English learning infants' segmentation of iambs. Spanish has twice as many iambic words (40%) compared to English (20%). Using the Headturn Preference Procedure we tested bilingual Spanish and English learning 8-month-olds' ability to segment English iambs. Monolingual English learning infants succeed at this task only by 11 months. We showed that at 8 months, bilingual Spanish and English learning infants successfully segmented English iambs, and not simply the stressed syllable, unlike their monolingual English learning peers. At the same age, bilingual infants failed to segment Spanish iambs, just like their monolingual Spanish peers. These results cannot be explained by bilingual infants' reliance on transitional probability cues to segment words in both their native languages because statistical cues were comparable in the two languages. Instead, based on their accelerated development, we argue for autonomous but interdependent development of the two languages of bilingual infants.
Collapse
Affiliation(s)
- Victoria Mateu
- Department of Spanish & Portuguese, University of California Los Angeles, United States of America
| | - Megha Sundara
- Department of Linguistics, University of California Los Angeles, United States of America.
| |
Collapse
|
10
|
Zheng Y, Zhao Z, Yang X, Li X. The impact of musical expertise on anticipatory semantic processing during online speech comprehension: An electroencephalography study. BRAIN AND LANGUAGE 2021; 221:105006. [PMID: 34392023 DOI: 10.1016/j.bandl.2021.105006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 07/29/2021] [Accepted: 07/30/2021] [Indexed: 06/13/2023]
Abstract
Musical experience has been found to aid speech perception. This electroencephalography study further examined whether and how musical expertise affects high-level predictive semantic processing in speech comprehension. Musicians and non-musicians listened to semantically strongly/weakly constraining sentences, with each sentence being primed by a congruent/incongruent sentence-prosody. At the target nouns, a N400 reduction effect (strongly vs. weakly constraining) was observed in both groups, with the onset-latency of this effect being delayed for incongruent (vs. congruent) priming. At the transitive verbs preceding these target nouns, musicians' event-related-potential amplitude (in incongruent-priming) and beta-band oscillatory power (in congruent- and incongruent-priming) showed a semantic-constraint effect, and were correlated with the predictability of incoming nouns; non-musicians only demonstrated an event-related-potential semantic-constraint effect, which was correlated with the predictability of current verbs. These results indicate musical expertise enhances semantic prediction tendency in speech comprehension, and this effect might be not just an aftereffect of facilitated acoustic/phonological processing.
Collapse
Affiliation(s)
- Yuanyi Zheng
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100149, China
| | - Zitong Zhao
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100149, China
| | - Xiaohong Yang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100149, China
| | - Xiaoqing Li
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100149, China.
| |
Collapse
|
11
|
Sepulveda RE, Davidow JH, Altenberg EP, Šunić Z. Reliability of judgments of stuttering-related variables: The effect of language familiarity. JOURNAL OF FLUENCY DISORDERS 2021; 69:105851. [PMID: 34033989 DOI: 10.1016/j.jfludis.2021.105851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/13/2021] [Accepted: 04/28/2021] [Indexed: 06/12/2023]
Abstract
Previous studies demonstrate mixed results and some methodological limitations regarding judges' ability to reliably assess stuttering-related variables in an unfamiliar language. The present study examined intra- and inter-rater reliability for percent syllables stuttered (%SS), stuttering severity (SEV), syllables per minute (SPM), and speech naturalness (NAT) when English-speaking judges viewed speech samples in English and in a language with which they had no or minimal familiarity (Spanish). Over two time periods, 21 judges viewed eight videos of four bilingual persons who stutter. Data were analyzed for relative and absolute intra- and inter-rater reliability as well as for an effect of language on time period differences. Intra- and inter-rater relative reliability were good or excellent for all measures in both languages, with the exception of inter-rater relative reliability for NAT in both languages and %SS in Spanish. Intra-rater absolute reliability was acceptable in both languages for NAT and SEV and unacceptable in both for SPM and %SS. Inter-rater absolute reliability in both languages was unacceptable for all measures, even with judges with the same training. There was a clinically significant effect of language on %SS scores, but, despite a statistically significant effect of language for SPM and SEV, the differences were not clinically significant. Results indicate that reliability across and within languages varies by measure and is impacted by intra- vs. inter-rater reliability, relative vs. absolute reliability, and language familiarity. Modifications in training may be able to address some of the limitations found, particularly with regard to SPM and NAT.
Collapse
Affiliation(s)
| | - Jason H Davidow
- Department of Speech-Language-Hearing Sciences, Hofstra University, United States.
| | - Evelyn P Altenberg
- Department of Speech-Language-Hearing Sciences, Hofstra University, United States
| | - Zoran Šunić
- Department of Mathematics, Hofstra University, United States
| |
Collapse
|
12
|
Ferry A, Guellai B. Labels and object categorization in six- and nine-month-olds: tracking labels across varying carrier phrases. Infant Behav Dev 2021; 64:101606. [PMID: 34333262 DOI: 10.1016/j.infbeh.2021.101606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 06/16/2021] [Accepted: 07/02/2021] [Indexed: 11/16/2022]
Abstract
Language shapes object categorization in infants. This starts as a general enhanced attentional effect of language, which narrows to a specific link between labels and categories by twelve months. The current experiments examined this narrowing effect by investigating when infants track a consistent label across varied input. Six-month-old infants (N = 48) were familiarized to category exemplars, each presented with the exact same labeling phrase or the same label in different phrases. Evidence of object categorization at test was only found with the same phrase, suggesting that infants were not tracking the label's consistency, but rather that of the entire input. Nine-month-olds (N = 24) did show evidence of categorization across the varied phrases, suggesting that they were tracking the consistent label across the varied input.
Collapse
Affiliation(s)
- Alissa Ferry
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy; Division of Human Communication, Development and Hearing, University of Manchester, Manchester, UK.
| | - Bahia Guellai
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy; Laboratoire Ethologie Cognition, Développement (LECD), Université Paris Nanterre, France
| |
Collapse
|
13
|
|
14
|
Prosody facilitates learning the word order in a new language. Cognition 2021; 213:104686. [PMID: 33863550 DOI: 10.1016/j.cognition.2021.104686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 03/15/2021] [Accepted: 03/16/2021] [Indexed: 11/20/2022]
Abstract
One of the prominent ideas developed by Jacques Mehler and his colleagues was that perceptual tuning, present from birth on, enables infants, and language learners in general, to extract regularities from speech input. Here we discuss language learners'' ability to extract basic word order (VO or OV) structure from prosodic regularities in a language. The two are closely related: in phonological phrases of VO languages, the most prominent word is the rightmost one, and in OV languages, it is the leftmost one. In speech, this prominence is realized as extended duration, or as elevated pitch, sometimes combined with changes in intensity. When learning the first (L1) or the second language (L2), exposure to relevant rhythmic structure elicits implicit learning about syntactic structure, including the basic word order. However, it remains unclear whether triggering the learning process requires a certain level of familiarity with the relevant rhythm. It is moreover unknown whether prosodic information can help L2 learners to extract and learn the vocabulary of a new language. We tested Spanish- and Italian-speaking adults' ability to learn words from an artificial language with either non-native OV or native VO word order. The results show that learners used prosodic information to identify the most prominent words in short utterances when the artificial language was similar to the native language, with duration-based prominence in prosody and a VO word order. In contrast, when the artificial language had a non-native prominence marked by pitch alternations and an OV word order, prominent words were learned only after a three-day exposure to the relevant rhythmic structure. Thus, for adult L2 learners, only repeated exposure to the relevant prosody elicited learning new words from an unknown language with non-native prosodic marking, indicating that, with familiarity, prosodic cues can facilitate learning in L2.
Collapse
|
15
|
Matzinger T, Ritt N, Fitch WT. The Influence of Different Prosodic Cues on Word Segmentation. Front Psychol 2021; 12:622042. [PMID: 33796045 PMCID: PMC8007974 DOI: 10.3389/fpsyg.2021.622042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 02/02/2021] [Indexed: 12/02/2022] Open
Abstract
A prerequisite for spoken language learning is segmenting continuous speech into words. Amongst many possible cues to identify word boundaries, listeners can use both transitional probabilities between syllables and various prosodic cues. However, the relative importance of these cues remains unclear, and previous experiments have not directly compared the effects of contrasting multiple prosodic cues. We used artificial language learning experiments, where native German speaking participants extracted meaningless trisyllabic "words" from a continuous speech stream, to evaluate these factors. We compared a baseline condition (statistical cues only) to five test conditions, in which word-final syllables were either (a) followed by a pause, (b) lengthened, (c) shortened, (d) changed to a lower pitch, or (e) changed to a higher pitch. To evaluate robustness and generality we used three tasks varying in difficulty. Overall, pauses and final lengthening were perceived as converging with the statistical cues and facilitated speech segmentation, with pauses helping most. Final-syllable shortening hindered baseline speech segmentation, indicating that when cues conflict, prosodic cues can override statistical cues. Surprisingly, pitch cues had little effect, suggesting that duration may be more relevant for speech segmentation than pitch in our study context. We discuss our findings with regard to the contribution to speech segmentation of language-universal boundary cues vs. language-specific stress patterns.
Collapse
Affiliation(s)
- Theresa Matzinger
- Department of English, University of Vienna, Vienna, Austria
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
| | - Nikolaus Ritt
- Department of English, University of Vienna, Vienna, Austria
| | - W. Tecumseh Fitch
- Department of Behavioral and Cognitive Biology, University of Vienna, Vienna, Austria
- Cognitive Science Hub, University of Vienna, Vienna, Austria
| |
Collapse
|
16
|
Ou SC, Guo ZC. The differential effects of vowel and onset consonant lengthening on speech segmentation: Evidence from Taiwanese Southern Min. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1866. [PMID: 33765826 DOI: 10.1121/10.0003751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 02/19/2021] [Indexed: 06/12/2023]
Abstract
A review of previous speech segmentation research suggests the prediction that listeners of Taiwanese Southern Min (TSM), a lexical tone language, would exploit vowel lengthening and syllable-onset consonant lengthening to locate word ends and beginnings, respectively. Yet, correlations between segment duration and tone identity in tone languages along with some TSM-specific phonological phenomena may work against such use. Two artificial language learning experiments examined TSM listeners' use of the lengthening cues. The listeners heard the words of an artificial language (e.g., /ba.nu.me/) repeated continuously and identified them in a subsequent two-alternative forced-choice test. Experiment I revealed that their segmentation benefits from and only from word-initial onset lengthening or word-final vowel lengthening, supporting the prediction. Experiment II further demonstrated that these two cues in combination synergistically support segmentation at least when compared to word-initial onset lengthening alone, consistent with previous findings regarding complementary cues. These results furnish additional evidence that vowel and onset consonant lengthening affect segmentation in different ways, possibly reflecting a functional division between vowels and consonants that is supported by some prosody-computing mechanism. Additionally, vowel lengthening seems to affect segmentation to a greater extent than onset consonant lengthening. Possible explanations for this and further issues are discussed.
Collapse
Affiliation(s)
- Shu-Chen Ou
- Department of Foreign Languages and Literature, National Sun Yat-sen University, Kaohsiung, 80424, Taiwan
| | - Zhe-Chen Guo
- Department of Linguistics, University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
17
|
Endress AD, Johnson SP. When forgetting fosters learning: A neural network model for statistical learning. Cognition 2021; 213:104621. [PMID: 33608130 DOI: 10.1016/j.cognition.2021.104621] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 12/19/2020] [Accepted: 01/28/2021] [Indexed: 11/28/2022]
Abstract
Learning often requires splitting continuous signals into recurring units, such as the discrete words constituting fluent speech; these units then need to be encoded in memory. A prominent candidate mechanism involves statistical learning of co-occurrence statistics like transitional probabilities (TPs), reflecting the idea that items from the same unit (e.g., syllables within a word) predict each other better than items from different units. TP computations are surprisingly flexible and sophisticated. Humans are sensitive to forward and backward TPs, compute TPs between adjacent items and longer-distance items, and even recognize TPs in novel units. We explain these hallmarks of statistical learning with a simple model with tunable, Hebbian excitatory connections and inhibitory interactions controlling the overall activation. With weak forgetting, activations are long-lasting, yielding associations among all items; with strong forgetting, no associations ensue as activations do not outlast stimuli; with intermediate forgetting, the network reproduces the hallmarks above. Forgetting thus is a key determinant of these sophisticated learning abilities. Further, in line with earlier dissociations between statistical learning and memory encoding, our model reproduces the hallmarks of statistical learning in the absence of a memory store in which items could be placed.
Collapse
|
18
|
Toro JM, Crespo-Bojorque P. Arc-shaped pitch contours facilitate item recognition in non-human animals. Cognition 2021; 213:104614. [PMID: 33558018 DOI: 10.1016/j.cognition.2021.104614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 01/11/2021] [Accepted: 01/26/2021] [Indexed: 10/22/2022]
Abstract
Acoustic changes linked to natural prosody are a key source of information about the organization of language. Both human infants and adults readily take advantage of such changes to discover and memorize linguistic patterns. Do they so because our brain is efficiently wired to specifically process linguistic stimuli? Or are we co-opting for language acquisition purposes more general principles that might be inherited from our animal ancestors? Here, we address this question by exploring if other species profit from prosody to better process acoustic sequences. More specifically, we test whether arc-shaped pitch contours defining natural prosody might facilitate item recognition and memorization in rats. In two experiments, we presented to the rats nonsense words with flat, natural, inverted and random prosodic contours. We observed that the animals correctly recognized the familiarization words only when arc-shaped pitch contours were implemented over them. Our results suggest that other species might also benefit from prosody for the memorization of items in a sequence. Such capacity seems to be rooted in general principles of how biological sounds are produced and processed.
Collapse
Affiliation(s)
- Juan M Toro
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys, 23, 08019 Barcelona, Spain; Universitat Pompeu Fabra, C. Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain.
| | | |
Collapse
|
19
|
Hahn LE, Benders T, Snijders TM, Fikkert P. Six-month-old infants recognize phrases in song and speech. INFANCY 2020; 25:699-718. [PMID: 32794372 DOI: 10.1111/infa.12357] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Revised: 06/09/2020] [Accepted: 07/02/2020] [Indexed: 11/29/2022]
Abstract
Infants exploit acoustic boundaries to perceptually organize phrases in speech. This prosodic parsing ability is well-attested and is a cornerstone to the development of speech perception and grammar. However, infants also receive linguistic input in child songs. This study provides evidence that infants parse songs into meaningful phrasal units and replicates previous research for speech. Six-month-old Dutch infants (n = 80) were tested in the song or speech modality in the head-turn preference procedure. First, infants were familiarized to two versions of the same word sequence: One version represented a well-formed unit, and the other contained a phrase boundary halfway through. At test, infants were presented two passages, each containing one version of the familiarized sequence. The results for speech replicated the previously observed preference for the passage containing the well-formed sequence, but only in a more fine-grained analysis. The preference for well-formed phrases was also observed in the song modality, indicating that infants recognize phrase structure in song. There were acoustic differences between stimuli of the current and previous studies, suggesting that infants are flexible in their processing of boundary cues while also providing a possible explanation for differences in effect sizes.
Collapse
Affiliation(s)
- Laura E Hahn
- Centre for Language Studies, Radboud University, Nijmegen, The Netherlands.,International Max Planck Research School for Language Sciences, Nijmegen, The Netherlands
| | - Titia Benders
- Department of Linguistics, Macquarie University, Sydney, NSW, Australia
| | - Tineke M Snijders
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, The Netherlands
| | - Paula Fikkert
- Centre for Language Studies, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
20
|
Endress AD, Slone LK, Johnson SP. Statistical learning and memory. Cognition 2020; 204:104346. [PMID: 32615468 DOI: 10.1016/j.cognition.2020.104346] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 05/12/2020] [Accepted: 05/30/2020] [Indexed: 01/01/2023]
Abstract
Learners often need to identify and remember recurring units in continuous sequences, but the underlying mechanisms are debated. A particularly prominent candidate mechanism relies on distributional statistics such as Transitional Probabilities (TPs). However, it is unclear what the outputs of statistical segmentation mechanisms are, and if learners store these outputs as discrete chunks in memory. We critically review the evidence for the possibility that statistically coherent items are stored in memory and outline difficulties in interpreting past research. We use Slone and Johnson's (2018) experiments as a case study to show that it is difficult to delineate the different mechanisms learners might use to solve a learning problem. Slone and Johnson (2018) reported that 8-month-old infants learned coherent chunks of shapes in visual sequences. Here, we describe an alternate interpretation of their findings based on a multiple-cue integration perspective. First, when multiple cues to statistical structure were available, infants' looking behavior seemed to track with the strength of the strongest one - backward TPs, suggesting that infants process multiple cues simultaneously and select the strongest one. Second, like adults, infants are exquisitely sensitive to chunks, but may require multiple cues to extract them. In Slone and Johnson's (2018) experiments, these cues were provided by immediate chunk repetitions during familiarization. Accordingly, infants showed strongest evidence of chunking following familiarization sequences in which immediate repetitions were more frequent. These interpretations provide a strong argument for infants' processing of multiple cues and the potential importance of multiple cues for chunk recognition in infancy.
Collapse
Affiliation(s)
- Ansgar D Endress
- Department of Psychology, City, University of London, United Kingdom.
| | - Lauren K Slone
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, United States; Department of Psychology, Hope College, Holland, United States
| | - Scott P Johnson
- Department of Psychology, University of California, Los Angeles, United States
| |
Collapse
|
21
|
de la Cruz-Pavía I, Werker JF, Vatikiotis-Bateson E, Gervain J. Finding Phrases: The Interplay of Word Frequency, Phrasal Prosody and Co-speech Visual Information in Chunking Speech by Monolingual and Bilingual Adults. LANGUAGE AND SPEECH 2020; 63:264-291. [PMID: 31002280 PMCID: PMC7254630 DOI: 10.1177/0023830919842353] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The audiovisual speech signal contains multimodal information to phrase boundaries. In three artificial language learning studies with 12 groups of adult participants we investigated whether English monolinguals and bilingual speakers of English and a language with opposite basic word order (i.e., in which objects precede verbs) can use word frequency, phrasal prosody and co-speech (facial) visual information, namely head nods, to parse unknown languages into phrase-like units. We showed that monolinguals and bilinguals used the auditory and visual sources of information to chunk "phrases" from the input. These results suggest that speech segmentation is a bimodal process, though the influence of co-speech facial gestures is rather limited and linked to the presence of auditory prosody. Importantly, a pragmatic factor, namely the language of the context, seems to determine the bilinguals' segmentation, overriding the auditory and visual cues and revealing a factor that begs further exploration.
Collapse
Affiliation(s)
- Irene de la Cruz-Pavía
- Irene de la Cruz-Pavía, Integrative Neuroscience and Cognition Center (INCC—UMR 8002), Université Paris Descartes-CNRS, 45 rue des Saints-Pères, Paris, 75006, France.
| | - Janet F. Werker
- Department of Psychology, University of British Columbia, Canada
| | | | - Judit Gervain
- Integrative Neuroscience and Cognition Center (INCC—UMR 8002), Université Paris Descartes (Sorbonne Paris Cité), France; Integrative Neuroscience and Cognition Center (INCC—UMR 8002), CNRS, France
| |
Collapse
|
22
|
Fló A, Brusini P, Macagno F, Nespor M, Mehler J, Ferry AL. Newborns are sensitive to multiple cues for word segmentation in continuous speech. Dev Sci 2019; 22:e12802. [PMID: 30681763 DOI: 10.1111/desc.12802] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 01/19/2019] [Accepted: 01/21/2019] [Indexed: 11/30/2022]
Abstract
Before infants can learn words, they must identify those words in continuous speech. Yet, the speech signal lacks obvious boundary markers, which poses a potential problem for language acquisition (Swingley, Philos Trans R Soc Lond. Series B, Biol Sci 364(1536), 3617-3632, 2009). By the middle of the first year, infants seem to have solved this problem (Bergelson & Swingley, Proc Natl Acad Sci 109(9), 3253-3258, 2012; Jusczyk & Aslin, Cogn Psychol 29, 1-23, 1995), but it is unknown if segmentation abilities are present from birth, or if they only emerge after sufficient language exposure and/or brain maturation. Here, in two independent experiments, we looked at two cues known to be crucial for the segmentation of human speech: the computation of statistical co-occurrences between syllables and the use of the language's prosody. After a brief familiarization of about 3 min with continuous speech, using functional near-infrared spectroscopy, neonates showed differential brain responses on a recognition test to words that violated either the statistical (Experiment 1) or prosodic (Experiment 2) boundaries of the familiarization, compared to words that conformed to those boundaries. Importantly, word recognition in Experiment 2 occurred even in the absence of prosodic information at test, meaning that newborns encoded the phonological content independently of its prosody. These data indicate that humans are born with operational language processing and memory capacities and can use at least two types of cues to segment otherwise continuous speech, a key first step in language acquisition.
Collapse
Affiliation(s)
- Ana Fló
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy.,Cognitive Neuroimaging Unit, Commissariat à l'Energie Atomique (CEA), Institut National de la Santé et de la Recherche Médicale (INSERM) U992, NeuroSpin Center, Gif-sur-Yvette, France
| | - Perrine Brusini
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy.,Institute of Psychology Health and Society, University of Liverpool, Liverpool, UK
| | - Francesco Macagno
- Neonatology Unit, Azienda Ospedaliera Santa Maria della Misericordia, Udine, Italy
| | - Marina Nespor
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy
| | - Jacques Mehler
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy
| | - Alissa L Ferry
- Language, Cognition, and Development Laboratory, Scuola Internazionale di Studi Avanzati, Trieste, Italy.,Division of Human Communication, Hearing, and Development, University of Manchester, Manchester, UK
| |
Collapse
|
23
|
Hawthorne K. Prosody-driven syntax learning is robust to impoverished pitch and spectral cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2756. [PMID: 29857717 DOI: 10.1121/1.5031130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Across languages, prosodic boundaries tend to align with syntactic boundaries, and both infant and adult language learners capitalize on these correlations to jump-start syntax acquisition. However, it is unclear which prosodic cues-pauses, final-syllable lengthening, and/or pitch resets across boundaries-are necessary for prosodic bootstrapping to occur. It is also unknown how syntax acquisition is impacted when listeners do not have access to the full range of prosodic or spectral information. These questions were addressed using 14-channel noise-vocoded (spectrally degraded) speech. While pre-boundary lengthening and pauses are well-transmitted through noise-vocoded speech, pitch is not; overall intelligibility is also decreased. In two artificial grammar experiments, adult native English speakers showed a similar ability to use English-like prosody to bootstrap unfamiliar syntactic structures from degraded speech and natural, unmanipulated speech. Contrary to previous findings that listeners may require pitch resets and final lengthening to co-occur if no pause cue is present, participants in the degraded speech conditions were able to detect prosodic boundaries from lengthening alone. Results suggest that pitch is not necessary for adult English speakers to perceive prosodic boundaries associated with syntactic structures, and that prosodic bootstrapping is robust to degraded spectral information.
Collapse
Affiliation(s)
- Kara Hawthorne
- Department of Communication Sciences and Disorders, University of Mississippi, 304 George Hall, University, Mississippi 38677, USA
| |
Collapse
|
24
|
Endress AD, Langus A. Transitional probabilities count more than frequency, but might not be used for memorization. Cogn Psychol 2016; 92:37-64. [PMID: 27907807 DOI: 10.1016/j.cogpsych.2016.11.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Revised: 11/08/2016] [Accepted: 11/09/2016] [Indexed: 11/29/2022]
Abstract
Learners often need to extract recurring items from continuous sequences, in both vision and audition. The best-known example is probably found in word-learning, where listeners have to determine where words start and end in fluent speech. This could be achieved through universal and experience-independent statistical mechanisms, for example by relying on Transitional Probabilities (TPs). Further, these mechanisms might allow learners to store items in memory. However, previous investigations have yielded conflicting evidence as to whether a sensitivity to TPs is diagnostic of the memorization of recurring items. Here, we address this issue in the visual modality. Participants were familiarized with a continuous sequence of visual items (i.e., arbitrary or everyday symbols), and then had to choose between (i) high-TP items that appeared in the sequence, (ii) high-TP items that did not appear in the sequence, and (iii) low-TP items that appeared in the sequence. Items matched in TPs but differing in (chunk) frequency were much harder to discriminate than items differing in TPs (with no significant sensitivity to chunk frequency), and learners preferred unattested high-TP items over attested low-TP items. Contrary to previous claims, these results cannot be explained on the basis of the similarity of the test items. Learners thus weigh within-item TPs higher than the frequency of the chunks, even when the TP differences are relatively subtle. We argue that these results are problematic for distributional clustering mechanisms that analyze continuous sequences, and provide supporting computational results. We suggest that the role of TPs might not be to memorize items per se, but rather to prepare learners to memorize recurring items once they are presented in subsequent learning situations with richer cues.
Collapse
Affiliation(s)
| | - Alan Langus
- Cognitive Neuroscience Sector, International School for Advanced Studies, Trieste, Italy
| |
Collapse
|
25
|
Frost RLA, Monaghan P, Tatsumi T. Domain-general mechanisms for speech segmentation: The role of duration information in language learning. J Exp Psychol Hum Percept Perform 2016; 43:466-476. [PMID: 27893268 PMCID: PMC5327892 DOI: 10.1037/xhp0000325] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech segmentation is supported by multiple sources of information that may either inform language processing specifically, or serve learning more broadly. The Iambic/Trochaic Law (ITL), where increased duration indicates the end of a group and increased emphasis indicates the beginning of a group, has been proposed as a domain-general mechanism that also applies to language. However, language background has been suggested to modulate use of the ITL, meaning that these perceptual grouping preferences may instead be a consequence of language exposure. To distinguish between these accounts, we exposed native-English and native-Japanese listeners to sequences of speech (Experiment 1) and nonspeech stimuli (Experiment 2), and examined segmentation using a 2AFC task. Duration was manipulated over 3 conditions: sequences contained either an initial-item duration increase, or a final-item duration increase, or items of uniform duration. In Experiment 1, language background did not affect the use of duration as a cue for segmenting speech in a structured artificial language. In Experiment 2, the same results were found for grouping structured sequences of visual shapes. The results are consistent with proposals that duration information draws upon a domain-general mechanism that can apply to the special case of language acquisition. This study shows that adults prefer to group both sequences of shapes and sequences of speech with the final item as the one that has the longest duration. This suggests that final-item duration increase is a helpful cue for perceptual grouping in multiple domains—not just language processing. By testing native speakers of languages that use duration differently, we also show that this grouping preference does not arise from language experience.
Collapse
|
26
|
Filippi P. Emotional and Interactional Prosody across Animal Communication Systems: A Comparative Approach to the Emergence of Language. Front Psychol 2016; 7:1393. [PMID: 27733835 PMCID: PMC5039945 DOI: 10.3389/fpsyg.2016.01393] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 08/31/2016] [Indexed: 01/29/2023] Open
Abstract
Across a wide range of animal taxa, prosodic modulation of the voice can express emotional information and is used to coordinate vocal interactions between multiple individuals. Within a comparative approach to animal communication systems, I hypothesize that the ability for emotional and interactional prosody (EIP) paved the way for the evolution of linguistic prosody - and perhaps also of music, continuing to play a vital role in the acquisition of language. In support of this hypothesis, I review three research fields: (i) empirical studies on the adaptive value of EIP in non-human primates, mammals, songbirds, anurans, and insects; (ii) the beneficial effects of EIP in scaffolding language learning and social development in human infants; (iii) the cognitive relationship between linguistic prosody and the ability for music, which has often been identified as the evolutionary precursor of language.
Collapse
Affiliation(s)
- Piera Filippi
- Department of Artificial Intelligence, Vrije Universiteit BrusselBrussels, Belgium
| |
Collapse
|
27
|
Endress AD, Bonatti LL. Words, rules, and mechanisms of language acquisition. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2015; 7:19-35. [PMID: 26683248 DOI: 10.1002/wcs.1376] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 09/21/2015] [Accepted: 11/17/2015] [Indexed: 11/10/2022]
Abstract
We review recent artificial language learning studies, especially those following Endress and Bonatti (Endress AD, Bonatti LL. Rapid learning of syllable classes from a perceptually continuous speech stream. Cognition 2007, 105:247-299), suggesting that humans can deploy a variety of learning mechanisms to acquire artificial languages. Several experiments provide evidence for multiple learning mechanisms that can be deployed in fluent speech: one mechanism encodes the positions of syllables within words and can be used to extract generalization, while the other registers co-occurrence statistics of syllables and can be used to break a continuum into its components. We review dissociations between these mechanisms and their potential role in language acquisition. We then turn to recent criticisms of the multiple mechanisms hypothesis and show that they are inconsistent with the available data. Our results suggest that artificial and natural language learning is best understood by dissecting the underlying specialized learning abilities, and that these data provide a rare opportunity to link important language phenomena to basic psychological mechanisms. For further resources related to this article, please visit the WIREs website.
Collapse
|
28
|
Estes KG, Lew-Williams C. Listening through voices: Infant statistical word segmentation across multiple speakers. Dev Psychol 2015; 51:1517-28. [PMID: 26389607 PMCID: PMC4631842 DOI: 10.1037/a0039725] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To learn from their environments, infants must detect structure behind pervasive variation. This presents substantial and largely untested learning challenges in early language acquisition. The current experiments address whether infants can use statistical learning mechanisms to segment words when the speech signal contains acoustic variation produced by changes in speakers' voices. In Experiment 1, 8- and 10-month-old infants listened to a continuous stream of novel words produced by 8 different female voices. The voices alternated frequently, potentially interrupting infants' detection of transitional probability patterns that mark word boundaries. Infants at both ages successfully segmented words in the speech stream. In Experiment 2, 8-month-olds demonstrated the ability to generalize their learning about the speech stream when presented with a new, acoustically distinct voice during testing. However, in Experiments 3 and 4, when the same speech stream was produced by only 2 female voices, infants failed to segment the words. The results of these experiments indicate that low acoustic variation may interfere with infants' efficiency in segmenting words from continuous speech, but that infants successfully use statistical cues to segment words in conditions of high acoustic variation. These findings contribute to our understanding of whether statistical learning mechanisms can scale up to meet the demands of natural learning environments.
Collapse
|
29
|
Kakouros S, Räsänen O. Perception of Sentence Stress in Speech Correlates With the Temporal Unpredictability of Prosodic Features. Cogn Sci 2015; 40:1739-1774. [DOI: 10.1111/cogs.12306] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Revised: 07/18/2015] [Accepted: 07/25/2015] [Indexed: 11/26/2022]
Affiliation(s)
| | - Okko Räsänen
- Department of Signal Processing and Acoustics Aalto University
| |
Collapse
|
30
|
de Diego-Balaguer R, Rodríguez-Fornells A, Bachoud-Lévi AC. Prosodic cues enhance rule learning by changing speech segmentation mechanisms. Front Psychol 2015; 6:1478. [PMID: 26483731 PMCID: PMC4588126 DOI: 10.3389/fpsyg.2015.01478] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/14/2015] [Indexed: 01/28/2023] Open
Abstract
Prosody has been claimed to have a critical role in the acquisition of grammatical information from speech. The exact mechanisms by which prosodic cues enhance learning are fully unknown. Rules from language often require the extraction of non-adjacent dependencies (e.g., he plays, he sings, he speaks). It has been proposed that pauses enhance learning because they allow computing non-adjacent relations helping word segmentation by removing the need to compute adjacent computations. So far only indirect evidence from behavioral and electrophysiological measures comparing learning effects after exposure to speech with and without pauses support this claim. By recording event-related potentials during the acquisition process of artificial languages with and without pauses between words with embedded non-adjacent rules we provide direct evidence on how the presence of pauses modifies the way speech is processed during learning to enhance segmentation and rule generalization. The electrophysiological results indicate that pauses as short as 25 ms attenuated the N1 component irrespective of whether learning was possible or not. In addition, a P2 enhancement was present only when learning of non-adjacent dependencies was possible. The overall results support the claim that the simple presence of subtle pauses changed the segmentation mechanism used reflected in an exogenously driven N1 component attenuation and improving segmentation at the behavioral level. This effect can be dissociated from the endogenous P2 enhancement that is observed irrespective of the presence of pauses whenever non-adjacent dependencies are learned.
Collapse
Affiliation(s)
- Ruth de Diego-Balaguer
- Institució Catalana de Recerca i Estudis Avançats Barcelona, Spain ; Cognition and Brain Plasticity Unit, Hospitalet de Llobregat, Bellvitge Research Biomedical Institute (IDIBELL) Barcelona, Spain ; Department of Basic Psychology, University of Barcelona Barcelona, Spain
| | - Antoni Rodríguez-Fornells
- Institució Catalana de Recerca i Estudis Avançats Barcelona, Spain ; Cognition and Brain Plasticity Unit, Hospitalet de Llobregat, Bellvitge Research Biomedical Institute (IDIBELL) Barcelona, Spain ; Department of Basic Psychology, University of Barcelona Barcelona, Spain
| | - Anne-Catherine Bachoud-Lévi
- INSERM U955, Equipe 01, Neuropsychologie Interventionnelle, Institut Mondor de Recherche Biomédicale Créteil, France ; Département d'Etudes Cognitives, École Normale Supérieure Paris, France ; Faculté de Médecine, Université Paris-Est Créteil, France ; Assistance Publique-Hôpitaux de Paris, Centre de Référence Maladie de Huntington, Unité de Neurologie Cognitive, Hôpital Henri Mondor-Albert Chenevier Créteil, France
| |
Collapse
|
31
|
Azab SN, Ashour H. Studying some elicited verbal prosodic patterns in Egyptian specific language impaired children. Int J Pediatr Otorhinolaryngol 2015; 79:36-41. [PMID: 25468460 DOI: 10.1016/j.ijporl.2014.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 10/31/2014] [Accepted: 11/01/2014] [Indexed: 10/24/2022]
Abstract
BACKGROUND Prosody is the aspect of language that conveys emotion by changes in tone, rhythm, and emphasis during speech and the term specific language impairment (SLI) refers to children whose language development is substantially below their chronological age, despite a normal nonverbal intelligence and no obvious neurological or physiological impairments, or emotional and/or social difficulties that could impact language use. PURPOSE To assess prosodic skills in Arabic speaking children with specific language impairment, in order to answer the question "Are SLI children dysprosodic?" And to be put in consideration while choosing and applying the training procedure hence, qualifies the rehabilitation program. METHODS Thirty Egyptian normal children and 30 Egyptian children with specific language impairment (SLI) aged between 4 and 6 years were included in this study and were subjected to psychometric evaluation, audio logical assessment, Arabic language test, articulation test, and assessment protocol of prosody. RESULTS Egyptian specific language impaired children have lower prosodic skills scores than control group with positive significant correlation between total language ages of specific language impaired children and total prosodic scores. CONCLUSION Egyptian specific language impaired children have dysprosodic skills and the intervention program must include prosodic rehabilitation program in order to achieve higher improvement level.
Collapse
Affiliation(s)
- Safinaz Nagib Azab
- Unit of Phoniatrics, Department of Otorhinolaryngology, Faculty of Medicine, Beni Suef University, Egypt.
| | - Heba Ashour
- Unit of Phoniatrics, Department of Otorhinolaryngology, Faculty of Medicine, Beni Suef University, Egypt
| |
Collapse
|
32
|
Affiliation(s)
| | - Ansgar D. Endress
- Department of Technology; Universitat Pompeu Fabra
- Department of Psychology; City University London
| |
Collapse
|
33
|
Johnson EK, Seidl A, Tyler MD. The edge factor in early word segmentation: utterance-level prosody enables word form extraction by 6-month-olds. PLoS One 2014; 9:e83546. [PMID: 24421892 PMCID: PMC3885442 DOI: 10.1371/journal.pone.0083546] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 11/06/2013] [Indexed: 11/18/2022] Open
Abstract
Past research has shown that English learners begin segmenting words from speech by 7.5 months of age. However, more recent research has begun to show that, in some situations, infants may exhibit rudimentary segmentation capabilities at an earlier age. Here, we report on four perceptual experiments and a corpus analysis further investigating the initial emergence of segmentation capabilities. In Experiments 1 and 2, 6-month-olds were familiarized with passages containing target words located either utterance medially or at utterance edges. Only those infants familiarized with passages containing target words aligned with utterance edges exhibited evidence of segmentation. In Experiments 3 and 4, 6-month-olds recognized familiarized words when they were presented in a new acoustically distinct voice (male rather than female), but not when they were presented in a phonologically altered manner (missing the initial segment). Finally, we report corpus analyses examining how often different word types occur at utterance boundaries in different registers. Our findings suggest that edge-aligned words likely play a key role in infants' early segmentation attempts, and also converge with recent reports suggesting that 6-month-olds' have already started building a rudimentary lexicon.
Collapse
Affiliation(s)
| | - Amanda Seidl
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Michael D. Tyler
- MARCS Institute and School of Social Sciences and Psychology, University of Western Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
34
|
Mitchel AD, Weiss DJ. Visual speech segmentation: using facial cues to locate word boundaries in continuous speech. LANGUAGE AND COGNITIVE PROCESSES 2014; 29:771-780. [PMID: 25018577 PMCID: PMC4091796 DOI: 10.1080/01690965.2013.791703] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.
Collapse
Affiliation(s)
- Aaron D. Mitchel
- Department of Psychology, Bucknell University, Lewisburg, PA 17837, USA
| | - Daniel J. Weiss
- Department of Psychology and Program in Linguistics, The Pennsylvania State University, 643 Moore Building, University Park, PA 16802, USA
| |
Collapse
|
35
|
Predictions in speech comprehension: fMRI evidence on the meter–semantic interface. Neuroimage 2013; 70:89-100. [DOI: 10.1016/j.neuroimage.2012.12.013] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 12/05/2012] [Accepted: 12/08/2012] [Indexed: 11/24/2022] Open
|
36
|
Zion Golumbic EM, Poeppel D, Schroeder CE. Temporal context in speech processing and attentional stream selection: a behavioral and neural perspective. BRAIN AND LANGUAGE 2012; 122:151-61. [PMID: 22285024 PMCID: PMC3340429 DOI: 10.1016/j.bandl.2011.12.010] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2011] [Revised: 12/14/2011] [Accepted: 12/16/2011] [Indexed: 05/04/2023]
Abstract
The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the 'Cocktail Party' effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech's temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of 'Active Sensing', emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input.
Collapse
Affiliation(s)
- Elana M Zion Golumbic
- Department of Psychiatry, Columbia University Medical Center, 710 W 168th St., New York, NY 10032, USA.
| | | | | |
Collapse
|
37
|
Endress AD, Wood JN. From movements to actions: Two mechanisms for learning action sequences. Cogn Psychol 2011; 63:141-71. [DOI: 10.1016/j.cogpsych.2011.07.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2009] [Accepted: 07/07/2011] [Indexed: 10/17/2022]
|
38
|
Kuipers JR, Thierry G. Event-related potential correlates of language change detection in bilingual toddlers. Dev Cogn Neurosci 2011; 2:97-102. [PMID: 22682731 DOI: 10.1016/j.dcn.2011.08.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 07/29/2011] [Accepted: 08/05/2011] [Indexed: 11/19/2022] Open
Abstract
Children raised in a bilingual environment are faced with the daunting task of learning to extract meaning from language input that can differ between caregivers but, depending on the social context, also within caregivers. Here, we investigated monolingual and bilingual toddlers' brain responses to an unexpected language change. We presented 2-3 year old children with picture-word pairs and occasionally changed the language of the spoken word while recording event-related potentials (ERPs). In line with previous results obtained in adults, bilingual children differentiated between the languages of input faster than their monolingual peers, i.e., within 200 ms of spoken word onset, a time range previously associated with lexical access. However, while adult bilinguals displayed a late stimulus re-evaluation ERP response to a language change, no such modulation was found in bilingual toddlers. These results suggest that although bilingual individuals are sensitive to phonemic language cues already from an early age, language awareness and language monitoring mechanisms probably develop later in life.
Collapse
Affiliation(s)
- Jan Rouke Kuipers
- ESRC Centre for Research on Bilingualism in Theory and Practice, Bangor University, Bangor LL57 2GD, United Kingdom.
| | | |
Collapse
|
39
|
Shukla M, White KS, Aslin RN. Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infants. Proc Natl Acad Sci U S A 2011; 108:6038-43. [PMID: 21444800 PMCID: PMC3076873 DOI: 10.1073/pnas.1017617108] [Citation(s) in RCA: 144] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Human infants are predisposed to rapidly acquire their native language. The nature of these predispositions is poorly understood, but is crucial to our understanding of how infants unpack their speech input to recover the fundamental word-like units, assign them referential roles, and acquire the rules that govern their organization. Previous researchers have demonstrated the role of general distributional computations in prelinguistic infants' parsing of continuous speech. We extend these findings to more naturalistic conditions, and find that 6-mo-old infants can simultaneously segment a nonce auditory word form from prosodically organized continuous speech and associate it to a visual referent. Crucially, however, this mapping occurs only when the word form is aligned with a prosodic phrase boundary. Our findings suggest that infants are predisposed very early in life to hypothesize that words are aligned with prosodic phrase boundaries, thus facilitating the word learning process. Further, and somewhat paradoxically, we observed successful learning in a more complex context than previously studied, suggesting that learning is enhanced when the language input is well matched to the learner's expectations.
Collapse
Affiliation(s)
- Mohinish Shukla
- Department of Brain and Cognitive Sciences, Meliora Hall, University of Rochester, Rochester, NY 14627, USA.
| | | | | |
Collapse
|