1
|
Li Y, Breithaupt F, Hills T, Lin Z, Chen Y, Siew CSW, Hertwig R. How cognitive selection affects language change. Proc Natl Acad Sci U S A 2024; 121:e2220898120. [PMID: 38150495 PMCID: PMC10769849 DOI: 10.1073/pnas.2220898120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 10/12/2023] [Indexed: 12/29/2023] Open
Abstract
Like biological species, words in language must compete to survive. Previously, it has been shown that language changes in response to cognitive constraints and over time becomes more learnable. Here, we use two complementary research paradigms to demonstrate how the survival of existing word forms can be predicted by psycholinguistic properties that impact language production. In the first study, we analyzed the survival of words in the context of interpersonal communication. We analyzed data from a large-scale serial-reproduction experiment in which stories were passed down along a transmission chain over multiple participants. The results show that words that are acquired earlier in life, more concrete, more arousing, and more emotional are more likely to survive retellings. We reason that the same trend might scale up to language evolution over multiple generations of natural language users. If that is the case, the same set of psycholinguistic properties should also account for the change of word frequency in natural language corpora over historical time. That is what we found in two large historical-language corpora (Study 2): Early acquisition, concreteness, and high arousal all predict increasing word frequency over the past 200 y. However, the two studies diverge with respect to the impact of word valence and word length, which we take up in the discussion. By bridging micro-level behavioral preferences and macro-level language patterns, our investigation sheds light on the cognitive mechanisms underlying word competition.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing100101, China
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin14195, Germany
| | - Fritz Breithaupt
- Department of Germanic Studies, Indiana University, Bloomington, IN001809
- Program of Cognitive Science, Indiana University, Bloomington, IN001809
| | - Thomas Hills
- Department of Psychology, University of Warwick, CoventryCV4 7AL, United Kingdom
| | - Ziyong Lin
- Center for Life Span Psychology, Max Planck Institute for Human Development, Berlin14195, Germany
| | - Yanyan Chen
- Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing100101, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing100049, China
| | - Cynthia S. W. Siew
- Department of Psychology, National University of Singapore, Singapore119077, Singapore
| | - Ralph Hertwig
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin14195, Germany
| |
Collapse
|
2
|
Hendrix P, Sun CC, Brighton H, Bender A. On the Connection Between Language Change and Language Processing. Cogn Sci 2023; 47:e13384. [PMID: 38071744 DOI: 10.1111/cogs.13384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 10/22/2023] [Accepted: 11/06/2023] [Indexed: 12/18/2023]
Abstract
Previous studies provided evidence for a connection between language processing and language change. We add to these studies with an exploration of the influence of lexical-distributional properties of words in orthographic space, semantic space, and the mapping between orthographic and semantic space on the probability of lexical extinction. Through a binomial linear regression analysis, we investigated the probability of lexical extinction by the first decade of the twenty-first century (2000s) for words that existed in the first decade of the nineteenth-century (1800s) in eight data sets for five languages: English, French, German, Italian, and Spanish. The binomial linear regression analysis revealed that words that are more similar in form to other words are less likely to disappear from a language. By contrast, words that are more similar in meaning to other words are more likely to become extinct. In addition, a more consistent mapping between form and meaning protects a word from lexical extinction. A nonlinear time-to-event analysis furthermore revealed that the position of a word in orthographic and semantic space continues to influence the probability of it disappearing from a language for at least 200 years. Effects of the lexical-distributional properties of words under investigation here have been reported in the language processing literature as well. The results reported here, therefore, fit well with a usage-based approach to language change, which holds that language change is at least to some extent connected to cognitive mechanisms in the human brain.
Collapse
Affiliation(s)
- Peter Hendrix
- Department of Cognitive Science and Artificial Intelligence, Tilburg University
| | - Ching Chu Sun
- Department of General Linguistics, Tübingen University
| | - Henry Brighton
- Department of Cognitive Science and Artificial Intelligence, Tilburg University
| | - Andreas Bender
- Department of Statistics, Ludwig-Maximillians-University Munich
| |
Collapse
|
3
|
Muraki EJ, Abdalla S, Brysbaert M, Pexman PM. Concreteness ratings for 62,000 English multiword expressions. Behav Res Methods 2023; 55:2522-2531. [PMID: 35867207 DOI: 10.3758/s13428-022-01912-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/16/2022] [Indexed: 11/08/2022]
Abstract
Concreteness describes the degree to which a word's meaning is understood through perception and action. Many studies use the Brysbaert et al. (2014) concreteness ratings to investigate language processing and text analysis. However, these ratings are limited to English single words and a few two-word expressions. Increasingly, attention is focused on the importance of multiword expressions, given their centrality in everyday language use and language acquisition. We present concreteness ratings for 62,889 multiword expressions and examine their relationship to the existing concreteness ratings for single words and two-word expressions. These new ratings represent the first big dataset of multiword expressions, and will be useful for researchers interested in language acquisition and language processing, as well as natural language processing and text analysis.
Collapse
Affiliation(s)
- Emiko J Muraki
- Department of Psychology, University of Calgary, 2500 University Drive, Calgary, AB, T2N 1N4, Canada.
| | - Summer Abdalla
- School of Languages, Linguistics, Literatures and Cultures, University of Calgary, Calgary, Canada
| | - Marc Brysbaert
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Penny M Pexman
- Department of Psychology, University of Calgary, 2500 University Drive, Calgary, AB, T2N 1N4, Canada
| |
Collapse
|
4
|
Diachronic semantic change in language is constrained by how people use and learn language. Mem Cognit 2022; 50:1284-1298. [PMID: 35767153 PMCID: PMC9365724 DOI: 10.3758/s13421-022-01331-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2022] [Indexed: 11/24/2022]
Abstract
While it has long been understood that the human mind evolved to learn language, recent studies have begun to ask the inverted question: How has language evolved under the cognitive constraints of its users and become more learnable over time? In this paper, we explored how the semantic change of English words is shaped by the way humans acquire and process language. In Study 1, we quantified the extent of semantic change over the past 200 years and found that meaning change is more likely for words that are acquired later in life and are more difficult to process. We argue that it is human cognition that constrains the semantic evolution of words, rather than the other way around, because historical meanings of words were not easily accessible to people living today, and therefore could not have directly influenced how they learn and process language. In Study 2, we went further to show that semantic change, while bringing the benefit of meeting communicative needs, is cognitively costly for those who were born early enough to experience the change: Semantic change between 1970 and 2000 hindered processing speeds among middle-aged adults (ages 45–55) but not in younger adults (ages <25) in a semantic decision task. This hampering effect may have, in turn, curbed the rate of semantic change so that language does not change too fast for the human mind to catch up. Taken together, our research demonstrates that semantic change is shaped by processing and acquisition patterns across generations of language users.
Collapse
|
5
|
Moral foundations tracked over 200 years of lexicographic data, and their predictors. ANTHROPOLOGICAL REVIEW 2022. [DOI: 10.18778/1898-6773.85.2.04] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The prediction that reduction of negative selection decreases group-level competitiveness, as reflected in increased individual-focused and diminished group-focused moral foundations, is tested. To measure this hypothesized shift in moral foundations, we conduct a culturomic analysis of the utilization frequencies of items sourced from the moral foundations item pool, tracked among Britannic populations from 1800 to 1999 using Google Ngram Viewer. The resultant higher-order factor, which tracks increasing individualizing values and decreasing binding values, is termed Asabiyyah (capturing social cohesion and collective purpose). Two predictors of this factor are examined: change in the strength of intergroup competition and change in levels of indicators of developmental instability. Both the strength of intergroup competition and levels of developmental instability associate with Asabiyyah. Rising developmental instability mediates the impact of inter-group competition, indicating that reduced between-group competition might have relaxed negative selection against mutations, which might reduce Asabiyyah via their effects on inter-genomic transactions. These results must be interpreted carefully, given the clear real-world evidence that explicit commitment to group-oriented values often features in harmful and maladaptive social and political ideologies of an extreme character.
Collapse
|
6
|
Egeland J. The ups and downs of intelligence: The co-occurrence model and its associated research program. INTELLIGENCE 2022. [DOI: 10.1016/j.intell.2022.101643] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
7
|
D’Aversa FM, Lugli L, Borghi AM, Barca L. Implicit effect of abstract/concrete components in the categorization of Chinese words. JOURNAL OF COGNITIVE PSYCHOLOGY 2022. [DOI: 10.1080/20445911.2022.2049279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
| | - Luisa Lugli
- Department of Philosophy and Communication, University of Bologna, Bologna, Italy
| | - Anna M. Borghi
- Department of Dynamic and Clinical Psychology, Sapienza University of Rome, Rome, Italy
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
| | - Laura Barca
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
| |
Collapse
|
8
|
Sun K, Lu X. Assessing Lexical Psychological Properties in Second Language Production: A Dynamic Semantic Similarity Approach. Front Psychol 2021; 12:672243. [PMID: 34630198 PMCID: PMC8495422 DOI: 10.3389/fpsyg.2021.672243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 08/25/2021] [Indexed: 11/13/2022] Open
Abstract
Previous studies of the lexical psycholinguistic properties (LPPs) in second language (L2) production have assessed the degree of an LPP dimension of an L2 corpus by computing the mean ratings of unique content words in the corpus for that dimension, without considering the possibility that learners at different proficiency levels may perceive the degree of that dimension of the same words differently. This study extended a dynamic semantic similarity algorithm to estimate the degree of five different LPP dimensions of several sub-corpora of the Education First-Cambridge Open Language Database representing L2 English learners at different proficiency levels. Our findings provide initial evidence for the validity of the algorithm for assessing the LPPs in L2 production and contribute useful insights into between-proficiency relationships and cross-proficiency differences in the LPPs in L2 production as well as the relationships among different LPP dimensions.
Collapse
Affiliation(s)
- Kun Sun
- Department of Linguistics, University of Tübingen, Tübingen, Germany
| | - Xiaofei Lu
- Department of Applied Linguistics, The Pennsylvania State University (PSU), University Park, PA, United States
| |
Collapse
|
9
|
Li Y, T Hills T. Language patterns of outgroup prejudice. Cognition 2021; 215:104813. [PMID: 34192608 DOI: 10.1016/j.cognition.2021.104813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 06/06/2021] [Accepted: 06/12/2021] [Indexed: 11/28/2022]
Abstract
Although explicit verbal expression of prejudice and stereotypes may have become less common due to the recent rise of social norms against prejudice, prejudice in language still persists in more subtle forms. It remains unclear whether and how language patterns predict variance in prejudice across a large number of minority groups. Informed by construal level theory, intergroup-contact theory, and linguistic expectancy bias, we leverage a natural language corpus of 1.8 million newspaper articles to investigate patterns of language referencing 60 U.S. minority groups. We found that perception of social distance among immigrant groups is reflected in language production: Groups perceived as socially distant (vs. close) are also more likely to be mentioned in abstract (vs. concrete) language. Concreteness was also strongly positively correlated with sentiment, a phenomenon that was unique to language concerning minority groups, suggesting a strong tendency for more socially distant groups to be represented with more negative language. We also provide a qualitative exploration of the content of outgroup prejudice by applying Latent Dirichlet Allocation to language referencing minority groups in the context of immigration. We identified 15 immigrant-related topics (e.g., politics, arts, crime, illegal workers, museums, food) and the strength of their association and relationship with perceived sentiment for each minority group. This research demonstrates how perceived social distance and language concreteness are related and correlate with outgroup negativity, provides a practical and ecologically valid method for investigating perceptions of minority groups in language, and helps elaborate the connection between theoretical positions from social psychology with recent studies from computer science on prejudice embedded in natural language.
Collapse
Affiliation(s)
- Ying Li
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany.
| | | |
Collapse
|
10
|
Cassani G, Bianchi F, Marelli M. Words with Consistent Diachronic Usage Patterns are Learned Earlier: A Computational Analysis Using Temporally Aligned Word Embeddings. Cogn Sci 2021; 45:e12963. [PMID: 33877700 PMCID: PMC8244097 DOI: 10.1111/cogs.12963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 02/15/2021] [Accepted: 02/21/2021] [Indexed: 11/29/2022]
Abstract
In this study, we use temporally aligned word embeddings and a large diachronic corpus of English to quantify language change in a data-driven, scalable way, which is grounded in language use. We show a unique and reliable relation between measures of language change and age of acquisition (AoA) while controlling for frequency, contextual diversity, concreteness, length, dominant part of speech, orthographic neighborhood density, and diachronic frequency variation. We analyze measures of language change tackling both the change in lexical representations and the change in the relation between lexical representations and the words with the most similar usage patterns, showing that they capture different aspects of language change. Our results show a unique relation between language change and AoA, which is stronger when considering neighborhood-level measures of language change: Words with more coherent diachronic usage patterns tend to be acquired earlier. The results support theories positing a link between ontogenetic and ethnogenetic processes in language.
Collapse
Affiliation(s)
- Giovanni Cassani
- Department of Cognitive Science and Artificial Intelligence, Tilburg University
| | - Federico Bianchi
- Bocconi Institute for Data Science and Analytics, Bocconi University
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca
| |
Collapse
|
11
|
Sun K, Liu H, Xiong W. The evolutionary pattern of language in scientific writings: A case study of Philosophical Transactions of Royal Society (1665–1869). Scientometrics 2020. [DOI: 10.1007/s11192-020-03816-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.
Collapse
|
12
|
Abstract
Despite increasing life expectancy and high levels of welfare, health care, and public safety in most post-industrial countries, the public discourse often revolves around perceived threats. Terrorism, global pandemics, and environmental catastrophes are just a few of the risks that dominate media coverage. Is this public discourse on risk disconnected from reality? To examine this issue, we analyzed the dynamics of the risk discourse in two natural language text corpora. Specifically, we tracked latent semantic patterns over a period of 150 years to address four questions: First, we examined how the frequency of the word risk has changed over historical time. Is the construct of risk playing an ever-increasing role in the public discourse, as the sociological notion of a 'risk society' suggests? Second, we investigated how the sentiments for the words co-occurring with risk have changed. Are the connotations of risk becoming increasingly ominous? Third, how has the meaning of risk changed relative to close associates such as danger and hazard? Is risk more subject to semantic change? Finally, we decompose the construct of risk into the specific topics with which it has been associated and track those topics over historical time. This brief history of the semantics of risk reveals new and surprising insights-a fourfold increase in frequency, increasingly negative sentiment, a semantic drift toward forecasting and prevention, and a shift away from war toward chronic disease-reflecting the conceptual evolution of risk in the archeological records of public discourse.
Collapse
Affiliation(s)
- Ying Li
- Center for Adaptive Rationality, Max Planck Institute for Human Development, 14195 Berlin, Germany.
| | - Thomas Hills
- Department of Psychology, University of Warwick, University Road, Coventry CV4 7AL, United Kingdom
| | - Ralph Hertwig
- Center for Adaptive Rationality, Max Planck Institute for Human Development, 14195 Berlin, Germany
| |
Collapse
|
13
|
Abstract
The recent rise in digitized historical text has made it possible to quantitatively study our psychological past. This involves understanding changes in what words meant, how words were used, and how these changes may have responded to changes in the environment, such as in healthcare, wealth disparity, and war. Here we make available a tool, the Macroscope, for studying historical changes in language over the last two centuries. The Macroscope uses over 155 billion words of historical text, which will grow as we include new historical corpora, and derives word properties from frequency-of-usage and co-occurrence patterns over time. Using co-occurrence patterns, the Macroscope can track changes in semantics, allowing researchers to identify semantically stable and unstable words in historical text and providing quantitative information about changes in a word’s valence, arousal, and concreteness, as well as information about new properties, such as semantic drift. The Macroscope provides information about both the local and global properties of words, as well as information about how these properties change over time, allowing researchers to visualize and download data in order to make inferences about historical psychology. Although quantitative historical psychology represents a largely new field of study, we see this work as complementing a wealth of other historical investigations, offering new insights and new approaches to understanding existing theory. The Macroscope is available online at http://www.macroscope.tech.
Collapse
|
14
|
Historical analysis of national subjective wellbeing using millions of digitized books. Nat Hum Behav 2019; 3:1271-1275. [PMID: 31611658 DOI: 10.1038/s41562-019-0750-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 08/30/2019] [Indexed: 11/08/2022]
Abstract
In addition to improving quality of life, higher subjective wellbeing leads to fewer health problems and higher productivity, making subjective wellbeing a focal issue among researchers and governments. However, it is difficult to estimate how happy people were during previous centuries. Here we show that a method based on the quantitative analysis of natural language published over the past 200 years captures reliable patterns in historical subjective wellbeing. Using sentiment analysis on the basis of psychological valence norms, we compute a national valence index for the United Kingdom, the United States, Germany and Italy, indicating relative happiness in response to national and international wars and in comparison to historical trends in longevity and gross domestic product. We validate our method using Eurobarometer survey data from the 1970s and demonstrate robustness using words with stable historical meanings, diverse corpora (newspapers, magazines and books) and additional word norms. By providing a window on quantitative historical psychology, this approach could inform policy and economic history.
Collapse
|
15
|
Jon-And A, Aguilar E. A model of contact-induced language change: Testing the role of second language speakers in the evolution of Mozambican Portuguese. PLoS One 2019; 14:e0212303. [PMID: 31022194 PMCID: PMC6483184 DOI: 10.1371/journal.pone.0212303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 01/31/2019] [Indexed: 11/23/2022] Open
Abstract
Language change is accelerated by language contact, especially by contact that occurs when a group of speakers shifts from one language to another. This has commonly been explained by linguistic innovation occurring during second language acquisition. This hypothesis is based on historical reconstructions of instances of contact and has not been formally tested on empirical data. In this paper, we construct an agent-based model to formalize the hypothesis that second language speakers are responsible for accelerated language change during language shift. We compare model predictions to a unique combination of diachronic linguistic and demographic data from Maputu, Mozambique. The model correctly predicts an increased proportional use of the novel linguistic variants during the period we study. We find that a modified version of the model is a better fit to one of our two datasets and discuss plausible reasons for this. As a general conclusion concerning typological differences between contact-induced and non-contact-induced language change, we suggest that multiple introductions of a new linguistic variant by different individuals may be the mechanism by which language contact accelerates language change.
Collapse
Affiliation(s)
- Anna Jon-And
- Centre for the Study of Cultural Evolution, Stockholm University, Stockholm, Sweden
- School of Humanities and Media Studies, Dalarna University, Falun, Sweden
- * E-mail:
| | - Elliot Aguilar
- Centre for the Study of Cultural Evolution, Stockholm University, Stockholm, Sweden
- Dept. of Biology, University of Pennsylvania, Philadelphia, PA, United States of America
| |
Collapse
|
16
|
Younes N, Reips UD. Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms. PLoS One 2019; 14:e0213554. [PMID: 30901329 PMCID: PMC6430395 DOI: 10.1371/journal.pone.0213554] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/24/2019] [Indexed: 11/19/2022] Open
Abstract
The Google Books Ngram Viewer (Google Ngram) is a search engine that charts word frequencies from a large corpus of books and thereby allows for the examination of cultural change as it is reflected in books. While the tool's massive corpus of data (about 8 million books or 6% of all books ever published) has been used in various scientific studies, concerns about the accuracy of results have simultaneously emerged. This paper reviews the literature and serves as a guideline for improving Google Ngram studies by suggesting five methodological procedures suited to increase the reliability of results. In particular, we recommend the use of (I) different language corpora, (II) cross-checks on different corpora from the same language, (III) word inflections, (IV) synonyms, and (V) a standardization procedure that accounts for both the influx of data and unequal weights of word frequencies. Further, we outline how to combine these procedures and address the risk of potential biases arising from censorship and propaganda. As an example of the proposed procedures, we examine the cross-cultural expression of religion via religious terms for the years 1900 to 2000. Special emphasis is placed on the situation during World War II. In line with the strand of literature that emphasizes the decline of collectivistic values, our results suggest an overall decrease of religion's importance. However, religion re-gains importance during times of crisis such as World War II. By comparing the results obtained through the different methods, we illustrate that applying and particularly combining our suggested procedures increase the reliability of results and prevents authors from deriving wrong assumptions.
Collapse
Affiliation(s)
- Nadja Younes
- Department of Psychology, University of Konstanz, Konstanz, Germany
- * E-mail:
| | | |
Collapse
|
17
|
Abstract
Humor ratings are provided for 4,997 English words collected from 821 participants using an online crowd-sourcing platform. Each participant rated 211 words on a scale from 1 (humorless) to 5 (humorous). To provide for comparisons across norms, words were chosen from a set common to a number of previously collected norms (e.g., arousal, valence, dominance, concreteness, age of acquisition, and reaction time). The complete dataset provides researchers with a list of humor ratings and includes information on gender, age, and educational differences. Results of analyses show that the ratings have reliability on a par with previous ratings and are not well predicted by existing norms.
Collapse
|
18
|
Reali F, Chater N, Christiansen MH. Simpler grammar, larger vocabulary: How population size affects language. Proc Biol Sci 2019; 285:rspb.2017.2586. [PMID: 29367397 PMCID: PMC5805949 DOI: 10.1098/rspb.2017.2586] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 01/02/2018] [Indexed: 11/25/2022] Open
Abstract
Languages with many speakers tend to be structurally simple while small communities sometimes develop languages with great structural complexity. Paradoxically, the opposite pattern appears to be observed for non-structural properties of language such as vocabulary size. These apparently opposite patterns pose a challenge for theories of language change and evolution. We use computational simulations to show that this inverse pattern can depend on a single factor: ease of diffusion through the population. A population of interacting agents was arranged on a network, passing linguistic conventions to one another along network links. Agents can invent new conventions, or replicate conventions that they have previously generated themselves or learned from other agents. Linguistic conventions are either Easy or Hard to diffuse, depending on how many times an agent needs to encounter a convention to learn it. In large groups, only linguistic conventions that are easy to learn, such as words, tend to proliferate, whereas small groups where everyone talks to everyone else allow for more complex conventions, like grammatical regularities, to be maintained. Our simulations thus suggest that language, and possibly other aspects of culture, may become simpler at the structural level as our world becomes increasingly interconnected.
Collapse
Affiliation(s)
- Florencia Reali
- Department of Psychology, Universidad de los Andes, G230, Cra. 1 Nro. 18A-12, Bogotá 11001000, Colombia
| | - Nick Chater
- Behavioural Science Group, Warwick Business School, University of Warwick, Coventry CV4 7AL, UK
| | - Morten H Christiansen
- Department of Psychology, Cornell University, Uris Hall, Ithaca, NY 14853, USA .,The Interacting Minds Centre and School for Culture and Communication, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|
19
|
Abstract
There are well-understood psychological limits on our capacity to process information. As information proliferation-the consumption and sharing of information-increases through social media and other communications technology, these limits create an attentional bottleneck, favoring information that is more likely to be searched for, attended to, comprehended, encoded, and later reproduced. In information-rich environments, this bottleneck influences the evolution of information via four forces of cognitive selection, selecting for information that is belief-consistent, negative, social, and predictive. Selection for belief-consistent information leads balanced information to support increasingly polarized views. Selection for negative information amplifies information about downside risks and crowds out potential benefits. Selection for social information drives herding, impairs objective assessments, and reduces exploration for solutions to hard problems. Selection for predictive patterns drives overfitting, the replication crisis, and risk seeking. This article summarizes the negative implications of these forces of cognitive selection and presents eight warnings that represent severe pitfalls for the naive "informavore," accelerating extremism, hysteria, herding, and the proliferation of misinformation.
Collapse
|
20
|
Mahowald K, Dautriche I, Gibson E, Piantadosi ST. Word Forms Are Structured for Efficient Use. Cogn Sci 2018; 42:3116-3134. [PMID: 30294810 DOI: 10.1111/cogs.12689] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 08/11/2018] [Accepted: 08/30/2018] [Indexed: 11/28/2022]
Abstract
Zipf famously stated that, if natural language lexicons are structured for efficient communication, the words that are used the most frequently should require the least effort. This observation explains the famous finding that the most frequent words in a language tend to be short. A related prediction is that, even within words of the same length, the most frequent word forms should be the ones that are easiest to produce and understand. Using orthographics as a proxy for phonetics, we test this hypothesis using corpora of 96 languages from Wikipedia. We find that, across a variety of languages and language families and controlling for length, the most frequent forms in a language tend to be more orthographically well-formed and have more orthographic neighbors than less frequent forms. We interpret this result as evidence that lexicons are structured by language usage pressures to facilitate efficient communication.
Collapse
Affiliation(s)
| | - Isabelle Dautriche
- School of Philosophy, Psychology and Language Sciences, University of Edinburgh
| | | | | |
Collapse
|
21
|
Jagiello RD, Hills TT. Bad News Has Wings: Dread Risk Mediates Social Amplification in Risk Communication. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2018; 38:2193-2207. [PMID: 29813185 DOI: 10.1111/risa.13117] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Social diffusion of information amplifies risk through processes of birth, death, and distortion of message content. Dread risk-involving uncontrollable, fatal, involuntary, and catastrophic outcomes (e.g., terrorist attacks and nuclear accidents)-may be particularly susceptible to amplification because of the psychological biases inherent in dread risk avoidance. To test this, initially balanced information about high or low dread topics was given to a set of individuals who then communicated this information through diffusion chains, each person passing a message to the next. A subset of these chains were also reexposed to the original information. We measured prior knowledge, perceived risk before and after transmission, and, at each link, number of positive and negative statements. Results showed that the more a message was transmitted the more negative statements it contained. This was highest for the high dread topic. Increased perceived risk and production of negative messages was closely related to the amount of negative information that was received, with domain knowledge mitigating this effect. Reexposure to the initial information was ineffectual in reducing bias, demonstrating the enhanced danger of socially transmitted information.
Collapse
Affiliation(s)
- Robert D Jagiello
- Department of Psychology, University of Warwick, University Road, Coventry, UK
| | - Thomas T Hills
- Department of Psychology, University of Warwick, University Road, Coventry, UK
| |
Collapse
|
22
|
Monster I, Lev‐Ari S. The Effect of Social Network Size on Hashtag Adoption on Twitter. Cogn Sci 2018; 42:3149-3158. [DOI: 10.1111/cogs.12675] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 01/18/2018] [Accepted: 07/20/2018] [Indexed: 11/28/2022]
Affiliation(s)
- Iris Monster
- Behavioural Science Institute Radboud University
| | - Shiri Lev‐Ari
- Department of Psychology Royal Holloway University of London
- Department of Psychology of Language Max Planck Institute for Psycholinguistics
| |
Collapse
|
23
|
|
24
|
Cornish H, Dale R, Kirby S, Christiansen MH. Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning. PLoS One 2017; 12:e0168532. [PMID: 28118370 PMCID: PMC5261806 DOI: 10.1371/journal.pone.0168532] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 12/03/2016] [Indexed: 11/18/2022] Open
Abstract
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language.
Collapse
Affiliation(s)
- Hannah Cornish
- Department of Psychology, The University of Stirling, Stirling, United Kingdom
| | - Rick Dale
- Cognitive and Information Sciences, University of California—Merced, Merced, CA, United States of America
| | - Simon Kirby
- School of Philosophy, Psychology and Language Science, The University of Edinburgh, Edinburgh, United Kingdom
| | - Morten H. Christiansen
- Department of Psychology, Cornell University, Ithaca, NY, United States of America
- The Interacting Minds Centre, Aarhus University, Aarhus, Denmark
- * E-mail:
| |
Collapse
|
25
|
Murdock J, Allen C, DeDeo S. Exploration and exploitation of Victorian science in Darwin's reading notebooks. Cognition 2016; 159:117-126. [PMID: 27939837 DOI: 10.1016/j.cognition.2016.11.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Revised: 11/23/2016] [Accepted: 11/28/2016] [Indexed: 11/24/2022]
Abstract
Search in an environment with an uncertain distribution of resources involves a trade-off between exploitation of past discoveries and further exploration. This extends to information foraging, where a knowledge-seeker shifts between reading in depth and studying new domains. To study this decision-making process, we examine the reading choices made by one of the most celebrated scientists of the modern era: Charles Darwin. From the full-text of books listed in his chronologically-organized reading journals, we generate topic models to quantify his local (text-to-text) and global (text-to-past) reading decisions using Kullback-Liebler Divergence, a cognitively-validated, information-theoretic measure of relative surprise. Rather than a pattern of surprise-minimization, corresponding to a pure exploitation strategy, Darwin's behavior shifts from early exploitation to later exploration, seeking unusually high levels of cognitive surprise relative to previous eras. These shifts, detected by an unsupervised Bayesian model, correlate with major intellectual epochs of his career as identified both by qualitative scholarship and Darwin's own self-commentary. Our methods allow us to compare his consumption of texts with their publication order. We find Darwin's consumption more exploratory than the culture's production, suggesting that underneath gradual societal changes are the explorations of individual synthesis and discovery. Our quantitative methods advance the study of cognitive search through a framework for testing interactions between individual and collective behavior and between short- and long-term consumption choices. This novel application of topic modeling to characterize individual reading complements widespread studies of collective scientific behavior.
Collapse
Affiliation(s)
- Jaimie Murdock
- Program in Cognitive Science, Indiana University, Bloomington, IN 47405, USA; School of Informatics and Computing, Indiana University, 919 E. 10th Street, Bloomington, IN 47408, USA.
| | - Colin Allen
- Program in Cognitive Science, Indiana University, Bloomington, IN 47405, USA; Department of History and Philosophy of Science and Medicine, Indiana University, Bloomington, IN 47405, USA; School of Humanities and Social Sciences, Xi'an Jiaotong University, Xi'an, China.
| | - Simon DeDeo
- Program in Cognitive Science, Indiana University, Bloomington, IN 47405, USA; School of Informatics and Computing, Indiana University, 919 E. 10th Street, Bloomington, IN 47408, USA; Department of Social and Decision Sciences, Carnegie Mellon University, 5000 Forbes Avenue, BP 208, Pittsburgh, PA 15213, USA; Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA.
| |
Collapse
|
26
|
Morin O, Acerbi A. Birth of the cool: a two-centuries decline in emotional expression in Anglophone fiction. Cogn Emot 2016; 31:1663-1675. [PMID: 27910735 DOI: 10.1080/02699931.2016.1260528] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The presence of emotional words and content in stories has been shown to enhance a story's memorability, and its cultural success. Yet, recent cultural trends run in the opposite direction. Using the Google Books corpus, coupled with two metadata-rich corpora of Anglophone fiction books, we show a decrease in emotionality in English-speaking literature starting plausibly in the nineteenth century. We show that this decrease cannot be explained by changes unrelated to emotionality (such as demographic dynamics concerning age or gender balance, changes in vocabulary richness, or changes in the prevalence of literary genres), and that, in our three corpora, the decrease is driven almost entirely by a decline in the proportion of positive emotion-related words, while the frequency of negative emotion-related words shows little if any decline. Consistently with previous studies, we also find a link between ageing and negative emotionality at the individual level.
Collapse
Affiliation(s)
- Olivier Morin
- a Max Planck Institute for the Science of Human History , Jena , Germany
| | - Alberto Acerbi
- b School of Innovation Sciences , Eindhoven University of Technology , Eindhoven , The Netherlands
| |
Collapse
|
27
|
Koplenig A, Müller-Spitzer C. Population Size Predicts Lexical Diversity, but so Does the Mean Sea Level --Why It Is Important to Correctly Account for the Structure of Temporal Data. PLoS One 2016; 11:e0150771. [PMID: 26938719 PMCID: PMC4777502 DOI: 10.1371/journal.pone.0150771] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/19/2022] Open
Abstract
In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.
Collapse
Affiliation(s)
- Alexander Koplenig
- Department of Lexical Studies, Institute for the German language (IDS), Mannheim, Germany
| | - Carolin Müller-Spitzer
- Department of Lexical Studies, Institute for the German language (IDS), Mannheim, Germany
| |
Collapse
|
28
|
Vejdemo S, Hörberg T. Semantic Factors Predict the Rate of Lexical Replacement of Content Words. PLoS One 2016; 11:e0147924. [PMID: 26820737 PMCID: PMC4731055 DOI: 10.1371/journal.pone.0147924] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 01/11/2016] [Indexed: 11/19/2022] Open
Abstract
The rate of lexical replacement estimates the diachronic stability of word forms on the basis of how frequently a proto-language word is replaced or retained in its daughter languages. Lexical replacement rate has been shown to be highly related to word class and word frequency. In this paper, we argue that content words and function words behave differently with respect to lexical replacement rate, and we show that semantic factors predict the lexical replacement rate of content words. For the 167 content items in the Swadesh list, data was gathered on the features of lexical replacement rate, word class, frequency, age of acquisition, synonyms, arousal, imageability and average mutual information, either from published databases or gathered from corpora and lexica. A linear regression model shows that, in addition to frequency, synonyms, senses and imageability are significantly related to the lexical replacement rate of content words–in particular the number of synonyms that a word has. The model shows no differences in lexical replacement rate between word classes, and outperforms a model with word class and word frequency predictors only.
Collapse
Affiliation(s)
- Susanne Vejdemo
- Department of Linguistics, Stockholm University, Stockholm, Sweden
- * E-mail:
| | - Thomas Hörberg
- Department of Linguistics, Stockholm University, Stockholm, Sweden
| |
Collapse
|