51
|
Safron A. The Radically Embodied Conscious Cybernetic Bayesian Brain: From Free Energy to Free Will and Back Again. ENTROPY (BASEL, SWITZERLAND) 2021; 23:783. [PMID: 34202965 PMCID: PMC8234656 DOI: 10.3390/e23060783] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/12/2021] [Accepted: 05/27/2021] [Indexed: 11/24/2022]
Abstract
Drawing from both enactivist and cognitivist perspectives on mind, I propose that explaining teleological phenomena may require reappraising both "Cartesian theaters" and mental homunculi in terms of embodied self-models (ESMs), understood as body maps with agentic properties, functioning as predictive-memory systems and cybernetic controllers. Quasi-homuncular ESMs are suggested to constitute a major organizing principle for neural architectures due to their initial and ongoing significance for solutions to inference problems in cognitive (and affective) development. Embodied experiences provide foundational lessons in learning curriculums in which agents explore increasingly challenging problem spaces, so answering an unresolved question in Bayesian cognitive science: what are biologically plausible mechanisms for equipping learners with sufficiently powerful inductive biases to adequately constrain inference spaces? Drawing on models from neurophysiology, psychology, and developmental robotics, I describe how embodiment provides fundamental sources of empirical priors (as reliably learnable posterior expectations). If ESMs play this kind of foundational role in cognitive development, then bidirectional linkages will be found between all sensory modalities and frontal-parietal control hierarchies, so infusing all senses with somatic-motoric properties, thereby structuring all perception by relevant affordances, so solving frame problems for embodied agents. Drawing upon the Free Energy Principle and Active Inference framework, I describe a particular mechanism for intentional action selection via consciously imagined (and explicitly represented) goal realization, where contrasts between desired and present states influence ongoing policy selection via predictive coding mechanisms and backward-chained imaginings (as self-realizing predictions). This embodied developmental legacy suggests a mechanism by which imaginings can be intentionally shaped by (internalized) partially-expressed motor acts, so providing means of agentic control for attention, working memory, imagination, and behavior. I further describe the nature(s) of mental causation and self-control, and also provide an account of readiness potentials in Libet paradigms wherein conscious intentions shape causal streams leading to enaction. Finally, I provide neurophenomenological handlings of prototypical qualia including pleasure, pain, and desire in terms of self-annihilating free energy gradients via quasi-synesthetic interoceptive active inference. In brief, this manuscript is intended to illustrate how radically embodied minds may create foundations for intelligence (as capacity for learning and inference), consciousness (as somatically-grounded self-world modeling), and will (as deployment of predictive models for enacting valued goals).
Collapse
Affiliation(s)
- Adam Safron
- Center for Psychedelic and Consciousness Research, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA;
- Kinsey Institute, Indiana University, Bloomington, IN 47405, USA
- Cognitive Science Program, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
52
|
ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behav Res Methods 2021; 53:818-835. [PMID: 32875399 PMCID: PMC8062390 DOI: 10.3758/s13428-020-01460-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recordings captured by wearable microphones are a standard method for investigating young children's language environments. A key measure to quantify from such data is the amount of speech present in children's home environments. To this end, the LENA recorder and software-a popular system for measuring linguistic input-estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings.
Collapse
|
53
|
Abstract
Evidence is reviewed for widespread phonological and phonetic tendencies in contemporary languages. The evidence is based largely on the frequency of sound types in word lists and in phoneme inventories across the world's languages. The data reviewed point to likely tendencies in the languages of the Upper Palaeolithic. These tendencies include the reliance on specific nasal and voiceless stop consonants, the relative dispreference for posterior voiced consonants and the use of peripheral vowels. More tenuous hypotheses related to prehistoric languages are also reviewed. These include the propositions that such languages lacked labiodental consonants and relied more heavily on vowels, when contrasted to many contemporary languages. Such hypotheses suggest speech has adapted to subtle pressures that may in some cases vary across populations. This article is part of the theme issue 'Reconstructing prehistoric languages'.
Collapse
Affiliation(s)
- Caleb Everett
- Department of Anthropology, University of Miami, Miami, FL, USA
| |
Collapse
|
54
|
Levshina N, Moran S. Efficiency in human languages: Corpus evidence for universal principles. LINGUISTICS VANGUARD : MULTIMODAL ONLINE JOURNAL 2021; 7:20200081. [PMID: 35879989 PMCID: PMC9052279 DOI: 10.1515/lingvan-2020-0081] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Over the last few years, there has been a growing interest in communicative efficiency. It has been argued that language users act efficiently, saving effort for processing and articulation, and that language structure and use reflect this tendency. The emergence of new corpus data has brought to life numerous studies on efficient language use in the lexicon, in morphosyntax, and in discourse and phonology in different languages. In this introductory paper, we discuss communicative efficiency in human languages, focusing on evidence of efficient language use found in multilingual corpora. The evidence suggests that efficiency is a universal feature of human language. We provide an overview of different manifestations of efficiency on different levels of language structure, and we discuss the major questions and findings so far, some of which are addressed for the first time in the contributions in this special collection.
Collapse
Affiliation(s)
- Natalia Levshina
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Steven Moran
- Language and Space Lab, University of Zurich, Zurich, Switzerland
| |
Collapse
|
55
|
Brysbaert M, Sui L, Duyck W, Dirix N. Improving reading rate prediction with word length information: Evidence from Dutch. Q J Exp Psychol (Hove) 2021; 74:2013-2018. [PMID: 33910411 DOI: 10.1177/17470218211017100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Previous research in English has suggested that reading rate predictions can be improved considerably by taking average word length into account. In this study, we investigated whether the same regularity holds for Dutch. The Dutch language is very similar to English, but words are on average half a letter longer: 5.1 letters per word (in non-fiction) instead of 4.6. We collected reading rates of 62 participants reading 12 texts with varying word lengths and examined which change in the English equation accounts for the Dutch findings. We observed that predictions were close to the best-fitting curve as soon as the average English word length was replaced by the average Dutch word length. The equation predicts that Dutch texts with an average word length of 5.1 letters will be read at a rate of 238 words per minute (wpm). Texts with an average word length of 4.5 letters will be read at 270 wpm, and texts with an average word length of 6.0 letters will be read at a rate of 202 wpm. The findings are in line with the assumption that the longer words in Dutch do not slow down silent reading relative to English and that the word length effect observed in each language is due to word processing effort and not to low-level visual factors.
Collapse
Affiliation(s)
- Marc Brysbaert
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Longjiao Sui
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Wouter Duyck
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Nicolas Dirix
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
56
|
Kogan VV, Reiterer SM. Eros, Beauty, and Phon-Aesthetic Judgements of Language Sound. We Like It Flat and Fast, but Not Melodious. Comparing Phonetic and Acoustic Features of 16 European Languages. Front Hum Neurosci 2021; 15:578594. [PMID: 33708080 PMCID: PMC7940689 DOI: 10.3389/fnhum.2021.578594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 01/12/2021] [Indexed: 11/13/2022] Open
Abstract
This article concerns sound aesthetic preferences for European foreign languages. We investigated the phonetic-acoustic dimension of the linguistic aesthetic pleasure to describe the "music" found in European languages. The Romance languages, French, Italian, and Spanish, take a lead when people talk about melodious language - the music-like effects in the language (a.k.a., phonetic chill). On the other end of the melodiousness spectrum are German and Arabic that are often considered sounding harsh and un-attractive. Despite the public interest, limited research has been conducted on the topic of phonaesthetics, i.e., the subfield of phonetics that is concerned with the aesthetic properties of speech sounds (Crystal, 2008). Our goal is to fill the existing research gap by identifying the acoustic features that drive the auditory perception of language sound beauty. What is so music-like in the language that makes people say "it is music in my ears"? We had 45 central European participants listening to 16 auditorily presented European languages and rating each language in terms of 22 binary characteristics (e.g., beautiful - ugly and funny - boring) plus indicating their language familiarities, L2 backgrounds, speaker voice liking, demographics, and musicality levels. Findings revealed that all factors in complex interplay explain a certain percentage of variance: familiarity and expertise in foreign languages, speaker voice characteristics, phonetic complexity, musical acoustic properties, and finally musical expertise of the listener. The most important discovery was the trade-off between speech tempo and so-called linguistic melody (pitch variance): the faster the language, the flatter/more atonal it is in terms of the pitch (speech melody), making it highly appealing acoustically (sounding beautiful and sexy), but not so melodious in a "musical" sense.
Collapse
Affiliation(s)
- Vita V Kogan
- School of European Culture and Languages, University of Kent, Kent, United Kingdom
| | - Susanne M Reiterer
- Department of Linguistics, University of Vienna, Vienna, Austria.,Teacher Education Centre, University of Vienna, Vienna, Austria
| |
Collapse
|
57
|
Tang K, Shaw JA. Prosody leaks into the memories of words. Cognition 2021; 210:104601. [PMID: 33508575 DOI: 10.1016/j.cognition.2021.104601] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 01/11/2021] [Accepted: 01/11/2021] [Indexed: 11/30/2022]
Abstract
The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of probabilistic reduction are stored as part of a word's mental representation. Other research has argued that predictability effects are tied to prosodic structure in integral ways. With the aim of assessing a potential prosodic basis for informativity effects in speech production, this study extends past work in two directions; it investigated informativity effects in another large language, Mandarin Chinese, and broadened the study beyond word duration to additional acoustic dimensions, pitch and intensity, known to index prosodic prominence. The acoustic information of content words was extracted from a large telephone conversation speech corpus with over 400,000 tokens and 6000 word types spoken by 1655 individuals and analyzed for the effect of informativity using frequency statistics estimated from a 431 million word subtitle corpus. Results indicated that words with low informativity have shorter durations, replicating the effect found in English. In addition, informativity had significant effects on maximum pitch and intensity, two phonetic dimensions related to prosodic prominence. Extending this interpretation, these results suggest that predictability is closely linked to prosodic prominence, and that the lexical representation of a word includes phonetic details associated with its average prosodic prominence in discourse. In other words, the lexicon absorbs prosodic influences on speech production.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Linguistics, University of Florida, Gainesville, FL 32611-5454, USA.
| | - Jason A Shaw
- Department of Linguistics, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
58
|
Assaneo MF, Rimmele JM, Sanz Perl Y, Poeppel D. Speaking rhythmically can shape hearing. Nat Hum Behav 2021; 5:71-82. [PMID: 33046860 DOI: 10.1038/s41562-020-00962-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 09/09/2020] [Indexed: 01/28/2023]
Abstract
Evidence suggests that temporal predictions arising from the motor system can enhance auditory perception. However, in speech perception, we lack evidence of perception being modulated by production. Here we show a behavioural protocol that captures the existence of such auditory-motor interactions. Participants performed a syllable discrimination task immediately after producing periodic syllable sequences. Two speech rates were explored: a 'natural' (individually preferred) and a fixed 'non-natural' (2 Hz) rate. Using a decoding approach, we show that perceptual performance is modulated by the stimulus phase determined by a participant's own motor rhythm. Remarkably, for 'natural' and 'non-natural' rates, this finding is restricted to a subgroup of the population with quantifiable auditory-motor coupling. The observed pattern is compatible with a neural model assuming a bidirectional interaction of auditory and speech motor cortices. Crucially, the model matches the experimental results only if it incorporates individual differences in the strength of the auditory-motor connection.
Collapse
Affiliation(s)
- M Florencia Assaneo
- Department of Psychology, New York University, New York, NY, USA. .,Instituto de Neurobiología, Universidad Nacional Autónoma de México, Santiago de Querétaro, Mexico.
| | - Johanna M Rimmele
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
| | - Yonatan Sanz Perl
- Department of Physics, FCEyN, University of Buenos Aires, Buenos Aires, Argentina.,National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina.,University of San Andrés, Buenos Aires, Argentina
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA.,Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| |
Collapse
|
59
|
Garcia M, Theunissen F, Sèbe F, Clavel J, Ravignani A, Marin-Cudraz T, Fuchs J, Mathevon N. Evolution of communication signals and information during species radiation. Nat Commun 2020; 11:4970. [PMID: 33009414 PMCID: PMC7532446 DOI: 10.1038/s41467-020-18772-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 09/09/2020] [Indexed: 01/22/2023] Open
Abstract
Communicating species identity is a key component of many animal signals. However, whether selection for species recognition systematically increases signal diversity during clade radiation remains debated. Here we show that in woodpecker drumming, a rhythmic signal used during mating and territorial defense, the amount of species identity information encoded remained stable during woodpeckers' radiation. Acoustic analyses and evolutionary reconstructions show interchange among six main drumming types despite strong phylogenetic contingencies, suggesting evolutionary tinkering of drumming structure within a constrained acoustic space. Playback experiments and quantification of species discriminability demonstrate sufficient signal differentiation to support species recognition in local communities. Finally, we only find character displacement in the rare cases where sympatric species are also closely related. Overall, our results illustrate how historical contingencies and ecological interactions can promote conservatism in signals during a clade radiation without impairing the effectiveness of information transfer relevant to inter-specific discrimination.
Collapse
Affiliation(s)
- Maxime Garcia
- Equipe Neuro-Ethologie Sensorielle ENES/CRNL, CNRS, INSERM, University of Lyon/Saint-Etienne, Saint-Étienne, France.
- Animal Behaviour, Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zürich, Switzerland.
| | - Frédéric Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, USA
- Department of Psychology and Integrative Biology, University of California, Berkeley, USA
| | - Frédéric Sèbe
- Equipe Neuro-Ethologie Sensorielle ENES/CRNL, CNRS, INSERM, University of Lyon/Saint-Etienne, Saint-Étienne, France
| | - Julien Clavel
- Institut de Biologie de l'École Normale Supérieure, CNRS, INSERM, École Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
- University of Lyon, Université Claude Bernard Lyon 1, CNRS, ENTPE, UMR 5023 LEHNA, F-69622, Villeurbanne, France
| | - Andrea Ravignani
- Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD, Nijmegen, The Netherlands
| | - Thibaut Marin-Cudraz
- Equipe Neuro-Ethologie Sensorielle ENES/CRNL, CNRS, INSERM, University of Lyon/Saint-Etienne, Saint-Étienne, France
| | - Jérôme Fuchs
- Institut de Systématique, Evolution, Biodiversité ISYEB, Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Nicolas Mathevon
- Equipe Neuro-Ethologie Sensorielle ENES/CRNL, CNRS, INSERM, University of Lyon/Saint-Etienne, Saint-Étienne, France.
- Institut Universitaire de France, Paris, France.
| |
Collapse
|
60
|
Risueno-Segovia C, Hage SR. Theta Synchronization of Phonatory and Articulatory Systems in Marmoset Monkey Vocal Production. Curr Biol 2020; 30:4276-4283.e3. [PMID: 32888481 DOI: 10.1016/j.cub.2020.08.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/20/2020] [Accepted: 08/05/2020] [Indexed: 11/27/2022]
Abstract
Human speech shares a 3-8-Hz theta rhythm across all languages [1-3]. According to the frame/content theory of speech evolution, this rhythm corresponds to syllabic rates derived from natural mandibular-associated oscillations [4]. The underlying pattern originates from oscillatory movements of articulatory muscles [4, 5] tightly linked to periodic vocal fold vibrations [4, 6, 7]. Such phono-articulatory rhythms have been proposed as one of the crucial preadaptations for human speech evolution [3, 8, 9]. However, the evolutionary link in phono-articulatory rhythmicity between vertebrate vocalization and human speech remains unclear. From the phonatory perspective, theta oscillations might be phylogenetically preserved throughout all vertebrate clades [10-12]. From the articulatory perspective, theta oscillations are present in non-vocal lip smacking [1, 13, 14], teeth chattering [15], vocal lip smacking [16], and clicks and faux-speech [17] in non-human primates, potential evolutionary precursors for speech rhythmicity [1, 13]. Notably, a universal phono-articulatory rhythmicity similar to that in human speech is considered to be absent in non-human primate vocalizations, typically produced with sound modulations lacking concomitant articulatory movements [1, 9, 18]. Here, we challenge this view by investigating the coupling of phonatory and articulatory systems in marmoset vocalizations. Using quantitative measures of acoustic call structure, e.g., amplitude envelope, and call-associated articulatory movements, i.e., inter-lip distance, we show that marmosets display speech-like bi-motor rhythmicity. These oscillations are synchronized and phase locked at theta rhythms. Our findings suggest that oscillatory rhythms underlying speech production evolved early in the primate lineage, identifying marmosets as a suitable animal model to decipher the evolutionary and neural basis of coupled phono-articulatory movements.
Collapse
Affiliation(s)
- Cristina Risueno-Segovia
- Neurobiology of Social Communication, Department of Otolaryngology, Head and Neck Surgery, Hearing Research Centre, University of Tübingen Medical Center, Elfriede-Aulhorn-Str. 5, 72076 Tübingen, Germany; Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, Otfried-Müller-Str. 25, 72076 Tübingen, Germany; Graduate School of Neural & Behavioural Sciences - International Max Planck Research School, University of Tübingen, Österberg-Str. 3, 72074 Tübingen, Germany
| | - Steffen R Hage
- Neurobiology of Social Communication, Department of Otolaryngology, Head and Neck Surgery, Hearing Research Centre, University of Tübingen Medical Center, Elfriede-Aulhorn-Str. 5, 72076 Tübingen, Germany; Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen, Otfried-Müller-Str. 25, 72076 Tübingen, Germany.
| |
Collapse
|
61
|
Haluts N, Trippa M, Friedmann N, Treves A. Professional or Amateur? The Phonological Output Buffer as a Working Memory Operator. ENTROPY 2020; 22:e22060662. [PMID: 33286434 PMCID: PMC7517200 DOI: 10.3390/e22060662] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 06/10/2020] [Accepted: 06/12/2020] [Indexed: 11/16/2022]
Abstract
The Phonological Output Buffer (POB) is thought to be the stage in language production where phonemes are held in working memory and assembled into words. The neural implementation of the POB remains unclear despite a wealth of phenomenological data. Individuals with POB impairment make phonological errors when they produce words and non-words, including phoneme omissions, insertions, transpositions, substitutions and perseverations. Errors can apply to different kinds and sizes of units, such as phonemes, number words, morphological affixes, and function words, and evidence from POB impairments suggests that units tend to substituted with units of the same kind—e.g., numbers with numbers and whole morphological affixes with other affixes. This suggests that different units are processed and stored in the POB in the same stage, but perhaps separately in different mini-stores. Further, similar impairments can affect the buffer used to produce Sign Language, which raises the question of whether it is instantiated in a distinct device with the same design. However, what appear as separate buffers may be distinct regions in the activity space of a single extended POB network, connected with a lexicon network. The self-consistency of this idea can be assessed by studying an autoassociative Potts network, as a model of memory storage distributed over several cortical areas, and testing whether the network can represent both units of word and signs, reflecting the types and patterns of errors made by individuals with POB impairment.
Collapse
Affiliation(s)
- Neta Haluts
- Language and Brain Lab, Sagol School of Neuroscience and School of Education, Tel Aviv University, Tel Aviv-Yafo 69978, Israel; (N.H.); (N.F.)
| | | | - Naama Friedmann
- Language and Brain Lab, Sagol School of Neuroscience and School of Education, Tel Aviv University, Tel Aviv-Yafo 69978, Israel; (N.H.); (N.F.)
| | - Alessandro Treves
- SISSA—Cognitive Neuroscience, Via Bonomea 265, 34136 Trieste, Italy;
- Correspondence:
| |
Collapse
|
62
|
Gutierrez-Vasques X, Mijangos V. Productivity and Predictability for Measuring Morphological Complexity. ENTROPY 2019; 22:e22010048. [PMID: 33285823 PMCID: PMC7516478 DOI: 10.3390/e22010048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/21/2019] [Accepted: 12/23/2019] [Indexed: 11/25/2022]
Abstract
We propose a quantitative approach for quantifying morphological complexity of a language based on text. Several corpus-based methods have focused on measuring the different word forms that a language can produce. We take into account not only the productivity of morphological processes but also the predictability of those morphological processes. We use a language model that predicts the probability of sub-word sequences within a word; we calculate the entropy rate of this model and use it as a measure of predictability of the internal structure of words. Our results show that it is important to integrate these two dimensions when measuring morphological complexity, since languages can be complex under one measure but simpler under another one. We calculated the complexity measures in two different parallel corpora for a typologically diverse set of languages. Our approach is corpus-based and it does not require the use of linguistic annotated data.
Collapse
Affiliation(s)
- Ximena Gutierrez-Vasques
- Language and Space Lab, URPP Language and Space, University of Zurich, 8006 Zurich, Switzerland
- Correspondence:
| | - Victor Mijangos
- Institute of Philological Research, National Autonomous University of Mexico, 04510 Mexico City, Mexico;
| |
Collapse
|