1
|
Zaghir J, Bjelogrlic M, Goldman JP, Ehrsam J, Gaudet-Blavignac C, Lovis C. Human-machine interactions with clinical phrase prediction system, aligning with Zipf's least effort principle? PLoS One 2024; 19:e0316177. [PMID: 39739748 PMCID: PMC11687647 DOI: 10.1371/journal.pone.0316177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 12/08/2024] [Indexed: 01/02/2025] Open
Abstract
The essence of language and its evolutionary determinants have long been research subjects with multifaceted explorations. This work reports on a large-scale observational study focused on the language use of clinicians interacting with a phrase prediction system in a clinical setting. By adopting principles of adaptation to evolutionary selection pressure, we attempt to identify the major determinants of language emergence specific to this context. The observed adaptation of clinicians' language behaviour with technology have been confronted to properties shaping language use, and more specifically on two driving forces: conciseness and distinctiveness. Our results suggest that users tailor their interactions to meet these specific forces to minimise the effort required to achieve their objective. At the same time, the study shows that the optimisation is mainly driven by the distinctive nature of interactions, favouring communication accuracy over ease. These results, published for the first time on a large-scale observational study to our knowledge, offer novel fundamental qualitative and quantitative insights into the mechanisms underlying linguistic behaviour among clinicians and its potential implications for language adaptation in human-machine interactions.
Collapse
Affiliation(s)
- Jamil Zaghir
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Mina Bjelogrlic
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Jean-Philippe Goldman
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Julien Ehrsam
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| |
Collapse
|
2
|
Meylan SC, Griffiths TL. Word Forms Reflect Trade-Offs Between Speaker Effort and Robust Listener Recognition. Cogn Sci 2024; 48:e13478. [PMID: 38980972 DOI: 10.1111/cogs.13478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 06/03/2024] [Accepted: 06/11/2024] [Indexed: 07/11/2024]
Abstract
How do cognitive pressures shape the lexicons of natural languages? Here, we reframe George Kingsley Zipf's proposed "law of abbreviation" within a more general framework that relates it to cognitive pressures that affect speakers and listeners. In this new framework, speakers' drive to reduce effort (Zipf's proposal) is counteracted by the need for low-frequency words to have word forms that are sufficiently distinctive to allow for accurate recognition by listeners. To support this framework, we replicate and extend recent work using the prevalence of subword phonemic sequences (phonotactic probability) to measure speakers' production effort in place of Zipf's measure of length. Across languages and corpora, phonotactic probability is more strongly correlated with word frequency than word length. We also show this measure of ease of speech production (phonotactic probability) is strongly correlated with a measure of perceptual difficulty that indexes the degree of competition from alternative interpretations in word recognition. This is consistent with the claim that there must be trade-offs between these two factors, and is inconsistent with a recent proposal that phonotactic probability facilitates both perception and production. To our knowledge, this is the first work to offer an explanation why long, phonotactically improbable word forms remain in the lexicons of natural languages.
Collapse
Affiliation(s)
- Stephan C Meylan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | | |
Collapse
|
3
|
Haslett DA, Cai ZG. Systematic mappings of sound to meaning: A theoretical review. Psychon Bull Rev 2024; 31:627-648. [PMID: 37803232 DOI: 10.3758/s13423-023-02395-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2023] [Indexed: 10/08/2023]
Abstract
The form of a word sometimes conveys semantic information. For example, the iconic word gurgle sounds like what it means, and busy is easy to identify as an English adjective because it ends in -y. Such links between form and meaning matter because they help people learn and use language. But gurgle also sounds like gargle and burble, and the -y in busy is morphologically and etymologically unrelated to the -y in crazy and watery. Whatever processing effects gurgle and busy have in common likely stem not from iconic, morphological, or etymological relationships but from systematicity more broadly: the phenomenon whereby semantically related words share a phonological or orthographic feature. In this review, we evaluate corpus evidence that spoken languages are systematic (even when controlling for iconicity, morphology, and etymology) and experimental evidence that systematicity impacts word processing (even in lieu of iconic, morphological, and etymological relationships). We conclude by drawing attention to the relationship between systematicity and low-frequency words and, consequently, the role that systematicity plays in natural language processing.
Collapse
Affiliation(s)
- David A Haslett
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Zhenguang G Cai
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, Hong Kong
- Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
4
|
Enfield NJ. Scale in Language. Cogn Sci 2023; 47:e13341. [PMID: 37823747 DOI: 10.1111/cogs.13341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/28/2023] [Accepted: 08/30/2023] [Indexed: 10/13/2023]
Abstract
A central concern of the cognitive science of language since its origins has been the concept of the linguistic system. Recent approaches to the system concept in language point to the exceedingly complex relations that hold between many kinds of interdependent systems, but it can be difficult to know how to proceed when "everything is connected." This paper offers a framework for tackling that challenge by identifying *scale* as a conceptual mooring for the interdisciplinary study of language systems. The paper begins by defining the scale concept-simply, the possibility for a measure to be larger or smaller in different instances of a system, such as a phonemic inventory, a word's frequency value in a corpus, or a speaker population. We review sites of scale difference in and across linguistic subsystems, drawing on findings from linguistic typology, grammatical description, morphosyntactic theory, psycholinguistics, computational corpus work, and social network demography. We consider possible explanations for scaling differences and constraints in language. We then turn to the question of *dependencies between* sites of scale difference in language, reviewing four sample domains of scale dependency: in phonological systems, across levels of grammatical structure (Menzerath's Law), in corpora (Zipf's Law and related issues), and in speaker population size. Finally, we consider the implications of the review, including the utility of a scale framework for generating new questions and inspiring methodological innovations and interdisciplinary collaborations in cognitive-scientific research on language.
Collapse
Affiliation(s)
- N J Enfield
- Discipline of Linguistics, The University of Sydney
| |
Collapse
|
5
|
Koshevoy A, Miton H, Morin O. Zipf's Law of Abbreviation holds for individual characters across a broad range of writing systems. Cognition 2023; 238:105527. [PMID: 37364507 DOI: 10.1016/j.cognition.2023.105527] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 06/12/2023] [Accepted: 06/14/2023] [Indexed: 06/28/2023]
Abstract
Zipf's Law of Abbreviation - the idea that more frequent symbols in a code are simpler than less frequent ones - has been shown to hold at the level of words in many languages. We tested whether it holds at the level of individual written characters. Character complexity is similar to word length in that it requires more cognitive and motor effort for producing and processing more complex symbols. We built a dataset of character complexity and frequency measures covering 27 different writing systems. According to our data, Zipf's Law of Abbreviation holds for every writing system in our dataset - the more frequent characters have lower degrees of complexity and vice-versa. This result provides further evidence of optimization mechanisms shaping communication systems.
Collapse
Affiliation(s)
- Alexey Koshevoy
- Laboratoire de Psychologie Cognitive, Aix-Marseille Université, CNRS, 13003 Marseille, France; Institut Jean Nicod, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, UMR 8129, 75005 Paris, France; Minds and Traditions Group, Max Planck Institute for Geoanthropology, 07745 Jena, Germany.
| | | | - Olivier Morin
- Institut Jean Nicod, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, UMR 8129, 75005 Paris, France; Minds and Traditions Group, Max Planck Institute for Geoanthropology, 07745 Jena, Germany
| |
Collapse
|
6
|
Koplenig A, Kupietz M, Wolfer S. Testing the Relationship between Word Length, Frequency, and Predictability Based on the German Reference Corpus. Cogn Sci 2022; 46:e13090. [DOI: 10.1111/cogs.13090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/15/2021] [Accepted: 12/17/2021] [Indexed: 11/29/2022]
Affiliation(s)
- Alexander Koplenig
- Department of Lexical Studies Leibniz‐Institute for the German Language (IDS)
| | - Marc Kupietz
- Department of Digital Linguistics Leibniz‐Institute for the German Language (IDS)
| | - Sascha Wolfer
- Department of Lexical Studies Leibniz‐Institute for the German Language (IDS)
| |
Collapse
|
7
|
Levshina N. Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora. ENTROPY (BASEL, SWITZERLAND) 2022; 24:280. [PMID: 35205578 PMCID: PMC8870940 DOI: 10.3390/e24020280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 02/13/2022] [Accepted: 02/14/2022] [Indexed: 02/04/2023]
Abstract
Zipf's law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) is more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study examines a more diverse sample of languages than the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish). I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters and in phonemes (for some of the languages), as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show different correlations between word length and the corpus-based measure for different languages. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, by the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions.
Collapse
Affiliation(s)
- Natalia Levshina
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| |
Collapse
|