1
|
Xu M, Hu L, Hinnant A. Pseudo-events: Tracking mediatization with machine learning over 40 years. COMPUTERS IN HUMAN BEHAVIOR 2023. [DOI: 10.1016/j.chb.2023.107735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
|
2
|
Vera J, Urbina F, Palma W. Formation of vocabularies in a decentralized graph-based approach to human language. Phys Rev E 2021; 103:022129. [PMID: 33736099 DOI: 10.1103/physreve.103.022129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 01/26/2021] [Indexed: 11/07/2022]
Abstract
Zipf's law establishes a scaling behavior for word frequencies in large text corpora. The appearance of Zipfian properties in vocabularies (viewed as an intermediate phase between referentially useless one-word systems and one-to-one word-meaning vocabularies) has been previously explained as an optimization problem for the interests of speakers and hearers. Remarkably, humanlike vocabularies can be viewed also as bipartite graphs. Thus, the aim here is double: within a bipartite-graph approach to human vocabularies, to propose a decentralized language game model for the formation of Zipfian properties. To do this, we define a language game in which a population of artificial agents is involved in idealized linguistic interactions. Numerical simulations show the appearance of a drastic transition from an initially disordered state towards three kinds of vocabularies. Our results open ways to study Zipfian properties in language, reconciling models seeing communication as a global minima of information entropic energies and models focused on self-organization.
Collapse
Affiliation(s)
- Javier Vera
- Pontificia Universidad Católica de Valparaíso, Valparaíso 2340025, Chile
| | - Felipe Urbina
- Centro de Investigación DAiTA Lab Facultad de Estudios Interdisciplinarios, Universidad Mayor, Santiago 7560913, Chile
| | - Wenceslao Palma
- Escuela de Ingeniería Informática Pontificia, Universidad Católica de Valparaíso, Valparaíso 2362807, Chile
| |
Collapse
|
3
|
Abstract
Beauty is subjective, and as such it, of course, cannot be defined in absolute terms. But we all know or feel when something is beautiful to us personally. And in such instances, methods of statistical physics and network science can be used to quantify and to better understand what it is that evokes that pleasant feeling, be it when reading a book or looking at a painting. Indeed, recent large-scale explorations of digital data have lifted the veil on many aspects of our artistic expressions that would remain forever hidden in smaller samples. From the determination of complexity and entropy of art paintings to the creation of the flavour network and the principles of food pairing, fascinating research at the interface of art, physics and network science abounds. We here review the existing literature, focusing in particular on culinary, visual, musical and literary arts. We also touch upon cultural history and culturomics, as well as on the connections between physics and the social sciences in general. The review shows that the synergies between these fields yield highly entertaining results that can often be enjoyed by layman and experts alike. In addition to its wider appeal, the reviewed research also has many applications, ranging from improved recommendation to the detection of plagiarism.
Collapse
Affiliation(s)
- Matjaž Perc
- Faculty of Natural Sciences and Mathematics, University of Maribor, Koroška cesta 160, 2000 Maribor, Slovenia.,Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan.,Complexity Science Hub Vienna, Josefstädterstraße 39, 1080 Vienna, Austria
| |
Collapse
|
4
|
Pascual I, Aguirre J, Manrubia S, Cuesta JA. Epistasis between cultural traits causes paradigm shifts in cultural evolution. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191813. [PMID: 32257337 PMCID: PMC7062103 DOI: 10.1098/rsos.191813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 01/30/2020] [Indexed: 06/11/2023]
Abstract
Every now and then the cultural paradigm of a society changes. While current models of cultural shifts usually require a major exogenous or endogenous change, we propose that the mechanism underlying many paradigm shifts may just be an emergent feature of the inherent congruence among different cultural traits. We implement this idea through a population dynamics model in which individuals are defined by a vector of cultural traits that changes mainly through cultural contagion, biased by a 'cultural fitness' landscape, between contemporary individuals. Cultural traits reinforce or hinder each other (through a form of cultural epistasis) to prevent cognitive dissonance. Our main result is that abrupt paradigm shifts occur, in response to weak changes in the landscape, only in the presence of epistasis between cultural traits, and regardless of whether horizontal transmission is biased by homophily. A relevant consequence of this dynamics is the irreversible nature of paradigm shifts: the old paradigm cannot be restored even if the external changes are undone. Our model puts the phenomenon of paradigm shifts in cultural evolution in the same category as catastrophic shifts in ecology or phase transitions in physics, where minute causes lead to major collective changes.
Collapse
Affiliation(s)
- Ignacio Pascual
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Departamento de Matemáticas, Universidad Carlos III de Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain
| | - Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain
| | - José A. Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Departamento de Matemáticas, Universidad Carlos III de Madrid, Spain
- Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain
- UC3M-BS Institute of Financial Big Data (IFiBiD), Madrid, Spain
| |
Collapse
|
5
|
Beyer R, Singarayer JS, Stock JT, Manica A. Environmental conditions do not predict diversification rates in the Bantu languages. Heliyon 2019; 5:e02630. [PMID: 31692645 PMCID: PMC6806388 DOI: 10.1016/j.heliyon.2019.e02630] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/26/2019] [Accepted: 10/08/2019] [Indexed: 11/30/2022] Open
Abstract
The global distribution of language diversity mirrors that of several variables related to ecosystem productivity. It has been argued that this is driven by the size of social networks, which tend to be larger in harsher climates to ensure food security, leading to reduced language divergence. Is this pattern purely synchronic, or is there also a quantifiable relationship between environmental conditions and language diversification over time? We used a spatio-temporal phylogeny of the Bantu language family to estimate local diversification rates at the times and locations of language divergence. We compared these data against spatially-explicit reconstructions of several palaeoclimate and palaeovegetation variables (mean annual temperature and the temperature of the coldest and warmest quarter, annual precipitation and the precipitation of the wettest and driest quarter, growing degree days, the length of the growing season, and net primary production), to investigate a potential link between local environmental factors and diversification rates in the Bantu languages. A regression analysis does not suggest a statistically significant relationship between climatic or ecological variables and linguistic diversification over time. We find a strong positive correlation between pairwise linguistic and geographic distances in the Bantu languages, arguing for a dominant role of isolation as a result of the rapid Bantu expansion that might have overwhelmed any potential influence of local environmental factors.
Collapse
Affiliation(s)
- Robert Beyer
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, United Kingdom
- PAVE Research Group, Department of Archaeology, University of Cambridge, Cambridge, CB2 3DZ, United Kingdom
| | - Joy S. Singarayer
- Department of Meteorology and Centre for Past Climate Change, University of Reading, Whiteknights campus, PO Box 243, Reading, RG6 6BB, United Kingdom
| | - Jay T. Stock
- PAVE Research Group, Department of Archaeology, University of Cambridge, Cambridge, CB2 3DZ, United Kingdom
- Department of Anthropology, Western University, London, Ontario, N6A 5C2, Canada
- Department of Archaeology, Max Planck Institute for the Science of Human History, Kahlaische Strasse 10. D-07745 Jena, Germany
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, United Kingdom
| |
Collapse
|
6
|
Liao J, Yang G, Kavaler D, Filkov V, Devanbu P. Status, identity, and language: A study of issue discussions in GitHub. PLoS One 2019; 14:e0215059. [PMID: 31199802 PMCID: PMC6568400 DOI: 10.1371/journal.pone.0215059] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 03/26/2019] [Indexed: 11/19/2022] Open
Abstract
Successful open source software (OSS) projects comprise freely observable, task-oriented social networks with hundreds or thousands of participants and large amounts of (textual and technical) discussion. The sheer volume of interactions and participants makes it challenging for participants to find relevant tasks, discussions and people. Tagging (e.g., @AmySmith) is a socio-technical practice that enables more focused discussion. By tagging important and relevant people, discussions can be advanced more effectively. However, for all but a few insiders, it can be difficult to identify important and/or relevant people. In this paper we study tagging in OSS projects from a socio-linguistics perspective. First we argue that textual content per se reveals a great deal about the status and identity of who is speaking and who is being addressed. Next, we suggest that this phenomenon can be usefully modeled using modern deep-learning methods. Finally, we illustrate the value of these approaches with tools that could assist people to find the important and relevant people for a discussion.
Collapse
Affiliation(s)
- Jingxian Liao
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - Guowei Yang
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - David Kavaler
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - Vladimir Filkov
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - Prem Devanbu
- Department of Computer Science, University of California Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
7
|
Sreedharan JK, Magner A, Grama A, Szpankowski W. Inferring Temporal Information from a Snapshot of a Dynamic Network. Sci Rep 2019; 9:3057. [PMID: 30816140 PMCID: PMC6395620 DOI: 10.1038/s41598-019-38912-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Accepted: 01/14/2019] [Indexed: 11/09/2022] Open
Abstract
The problem of reverse-engineering the evolution of a dynamic network, known broadly as network archaeology, is one of profound importance in diverse application domains. In analysis of infection spread, it reveals the spatial and temporal processes underlying infection. In analysis of biomolecular interaction networks (e.g., protein interaction networks), it reveals early molecules that are known to be differentially implicated in diseases. In economic networks, it reveals flow of capital and associated actors. Beyond these recognized applications, it provides analytical substrates for novel studies - for instance, on the structural and functional evolution of the human brain connectome. In this paper, we model, formulate, and rigorously analyze the problem of inferring the arrival order of nodes in a dynamic network from a single snapshot. We derive limits on solutions to the problem, present methods that approach this limit, and demonstrate the methods on a range of applications, from inferring the evolution of the human brain connectome to conventional citation and social networks, where ground truth is known.
Collapse
Affiliation(s)
- Jithin K Sreedharan
- Center for Science of Information, Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Abram Magner
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Ananth Grama
- Center for Science of Information, Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Wojciech Szpankowski
- Center for Science of Information, Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
8
|
The natural selection of words: Finding the features of fitness. PLoS One 2019; 14:e0211512. [PMID: 30689665 PMCID: PMC6349325 DOI: 10.1371/journal.pone.0211512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/15/2019] [Indexed: 11/20/2022] Open
Abstract
We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency. The algorithm uses features based on a word’s length, the characters in the word, and the historical frequencies of the word. It can predict change of leadership (including the identity of the new leader) fifty years in the future, with an F-score considerably above random guessing. Analysis of the learned models provides insight into the causes of change in the leader of a synset. The algorithm confirms observations linguists have made, such as the trend to replace the -ise suffix with -ize, the rivalry between the -ity and -ness suffixes, and the struggle between economy (shorter words are easier to remember and to write) and clarity (longer words are more distinctive and less likely to be confused with one another). The results indicate that integration of the Google Books Ngram Corpus with WordNet has significant potential for improving our understanding of how language evolves.
Collapse
|
9
|
Burridge J. Unifying models of dialect spread and extinction using surface tension dynamics. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171446. [PMID: 29410847 PMCID: PMC5792924 DOI: 10.1098/rsos.171446] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 11/23/2017] [Indexed: 06/01/2023]
Abstract
We provide a unified mathematical explanation of two classical forms of spatial linguistic spread. The wave model describes the radiation of linguistic change outwards from a central focus. Changes can also jump between population centres in a process known as hierarchical diffusion. It has recently been proposed that the spatial evolution of dialects can be understood using surface tension at linguistic boundaries. Here we show that the inclusion of long-range interactions in the surface tension model generates both wave-like spread, and hierarchical diffusion, and that it is surface tension that is the dominant effect in deciding the stable distribution of dialect patterns. We generalize the model to allow population mixing which can induce shrinkage of linguistic domains, or destroy dialect regions from within.
Collapse
|
10
|
Bao P, Zhang X. Uncovering and Predicting the Dynamic Process of Collective Attention with Survival Theory. Sci Rep 2017; 7:2621. [PMID: 28572618 PMCID: PMC5453944 DOI: 10.1038/s41598-017-02826-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 04/19/2017] [Indexed: 11/16/2022] Open
Abstract
The subject of collective attention is in the center of this era of information explosion. It is thus of great interest to understand the fundamental mechanism underlying attention in large populations within a complex evolving system. Moreover, an ability to predict the dynamic process of collective attention for individual items has important implications in an array of areas. In this report, we propose a generative probabilistic model using a self-excited Hawkes process with survival theory to model and predict the process through which individual items gain their attentions. This model explicitly captures three key ingredients: the intrinsic attractiveness of an item, characterizing its inherent competitiveness against other items; a reinforcement mechanism based on sum of each previous attention triggers; and a power-law temporal relaxation function, corresponding to the aging in the ability to attract new attentions. Experiments on two population-scale datasets demonstrate that this model consistently outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Peng Bao
- School of Software Engineering, Beijing Jiaotong University, Beijing, China.
| | - Xiaoxia Zhang
- School of Economics and Management, Tsinghua University, Beijing, China
| |
Collapse
|
11
|
Skrebyte A, Garnett P, Kendal JR. Temporal Relationships Between Individualism–Collectivism and the Economy in Soviet Russia. JOURNAL OF CROSS-CULTURAL PSYCHOLOGY 2016. [DOI: 10.1177/0022022116659540] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Collectivism and individualism are commonly used to delineate societies that differ in their cultural values and patterns of social behavior, prioritizing the relative importance of the group and the individual, respectively. Collectivist and individualist expression is likely to be intricately linked with the political and economic history of a society. Scholars have proposed mechanisms for both positive and negative correlations between economic growth and a culture of either individualism or collectivism. Here, we consider these relationships across the dramatic history of 20th- and early 21st-century Russia (1901-2009), spanning the late Russian Empire, the communist state, and the growth of capitalism. We sample Russian speakers to identify common Russian words expressing individualism or collectivism, and examine the changing frequencies of these terms in Russian publications collected in Google’s Ngram corpus. We correlate normalized individualism and collectivism expression against published estimates of economic growth (GDP and net material product [NMP]) available between 1961 and 1995, finding high collectivist expression and economic growth rate followed by the correlated decline of both prior to the end of Soviet system. Temporal trends in the published expression of individualism and collectivism, in addition to their correlations with estimated economic growth rates, are examined in relation to the change in economic and political structures, ideology and public discourse. We also compare our sampled Russian-language terms for individualism and collectivism with Twenge et al.’s equivalent collection from American English speakers.
Collapse
|
12
|
Yun J, Shang SC, Wei XD, Liu S, Li ZJ. The possibility of coexistence and co-development in language competition: ecology-society computational model and simulation. SPRINGERPLUS 2016; 5:855. [PMID: 27386304 PMCID: PMC4919202 DOI: 10.1186/s40064-016-2482-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 05/31/2016] [Indexed: 11/10/2022]
Abstract
Language is characterized by both ecological properties and social properties, and competition is the basic form of language evolution. The rise and decline of one language is a result of competition between languages. Moreover, this rise and decline directly influences the diversity of human culture. Mathematics and computer modeling for language competition has been a popular topic in the fields of linguistics, mathematics, computer science, ecology, and other disciplines. Currently, there are several problems in the research on language competition modeling. First, comprehensive mathematical analysis is absent in most studies of language competition models. Next, most language competition models are based on the assumption that one language in the model is stronger than the other. These studies tend to ignore cases where there is a balance of power in the competition. The competition between two well-matched languages is more practical, because it can facilitate the co-development of two languages. A third issue with current studies is that many studies have an evolution result where the weaker language inevitably goes extinct. From the integrated point of view of ecology and sociology, this paper improves the Lotka–Volterra model and basic reaction–diffusion model to propose an “ecology–society” computational model for describing language competition. Furthermore, a strict and comprehensive mathematical analysis was made for the stability of the equilibria. Two languages in competition may be either well-matched or greatly different in strength, which was reflected in the experimental design. The results revealed that language coexistence, and even co-development, are likely to occur during language competition.
Collapse
Affiliation(s)
- Jian Yun
- School of Computer Science and Engineering, Dalian Nationalities University, Dalian, 116600 Liaoning China
| | - Song-Chao Shang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054 Sichuan China
| | - Xiao-Dan Wei
- School of Computer Science and Engineering, Dalian Nationalities University, Dalian, 116600 Liaoning China
| | - Shuang Liu
- School of Computer Science and Engineering, Dalian Nationalities University, Dalian, 116600 Liaoning China
| | - Zhi-Jie Li
- School of Computer Science and Engineering, Dalian Nationalities University, Dalian, 116600 Liaoning China
| |
Collapse
|
13
|
Letchford A, Preis T, Moat HS. Quantifying the Search Behaviour of Different Demographics Using Google Correlate. PLoS One 2016; 11:e0149025. [PMID: 26910464 PMCID: PMC4766235 DOI: 10.1371/journal.pone.0149025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 01/26/2016] [Indexed: 11/18/2022] Open
Abstract
Vast records of our everyday interests and concerns are being generated by our frequent interactions with the Internet. Here, we investigate how the searches of Google users vary across U.S. states with different birth rates and infant mortality rates. We find that users in states with higher birth rates search for more information about pregnancy, while those in states with lower birth rates search for more information about cats. Similarly, we find that users in states with higher infant mortality rates search for more information about credit, loans and diseases. Our results provide evidence that Internet search data could offer new insight into the concerns of different demographics.
Collapse
Affiliation(s)
- Adrian Letchford
- Data Science Lab, Behavioural Science, Warwick Business School, University of Warwick, CV4 7AL, Coventry, United Kingdom
- * E-mail:
| | - Tobias Preis
- Data Science Lab, Behavioural Science, Warwick Business School, University of Warwick, CV4 7AL, Coventry, United Kingdom
| | - Helen Susannah Moat
- Data Science Lab, Behavioural Science, Warwick Business School, University of Warwick, CV4 7AL, Coventry, United Kingdom
| |
Collapse
|
14
|
Zambrano E, Hernando A, Fernández Bariviera A, Hernando R, Plastino A. Thermodynamics of firms' growth. J R Soc Interface 2015; 12:20150789. [PMID: 26510828 PMCID: PMC4685849 DOI: 10.1098/rsif.2015.0789] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 10/05/2015] [Indexed: 11/12/2022] Open
Abstract
The distribution of firms' growth and firms' sizes is a topic under intense scrutiny. In this paper, we show that a thermodynamic model based on the maximum entropy principle, with dynamical prior information, can be constructed that adequately describes the dynamics and distribution of firms' growth. Our theoretical framework is tested against a comprehensive database of Spanish firms, which covers, to a very large extent, Spain's economic activity, with a total of 1,155,142 firms evolving along a full decade. We show that the empirical exponent of Pareto's law, a rule often observed in the rank distribution of large-size firms, is explained by the capacity of economic system for creating/destroying firms, and that can be used to measure the health of a capitalist-based economy. Indeed, our model predicts that when the exponent is larger than 1, creation of firms is favoured; when it is smaller than 1, destruction of firms is favoured instead; and when it equals 1 (matching Zipf's law), the system is in a full macroeconomic equilibrium, entailing 'free' creation and/or destruction of firms. For medium and smaller firm sizes, the dynamical regime changes, the whole distribution can no longer be fitted to a single simple analytical form and numerical prediction is required. Our model constitutes the basis for a full predictive framework regarding the economic evolution of an ensemble of firms. Such a structure can be potentially used to develop simulations and test hypothetical scenarios, such as economic crisis or the response to specific policy measures.
Collapse
Affiliation(s)
- Eduardo Zambrano
- Social Thermodynamics Applied Research (SThAR), EPFL Innovation Park, Bâtiment C, 1015 Lausanne, Switzerland
| | - Alberto Hernando
- Social Thermodynamics Applied Research (SThAR), EPFL Innovation Park, Bâtiment C, 1015 Lausanne, Switzerland
| | | | - Ricardo Hernando
- Social Thermodynamics Applied Research (SThAR), EPFL Innovation Park, Bâtiment C, 1015 Lausanne, Switzerland
| | - Angelo Plastino
- National University of La Plata, Physics Institute (IFLP-CCT-CONICET) C.C.737, 1900 La Plata, Argentina
| |
Collapse
|
15
|
Bochkarev V, Solovyev V, Wichmann S. Universals versus historical contingencies in lexical evolution. J R Soc Interface 2015; 11:20140841. [PMID: 25274040 DOI: 10.1098/rsif.2014.0841] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The frequency with which we use different words changes all the time, and every so often, a new lexical item is invented or another one ceases to be used. Beyond a small sample of lexical items whose properties are well studied, little is known about the dynamics of lexical evolution. How do the lexical inventories of languages, viewed as entire systems, evolve? Is the rate of evolution of the lexicon contingent upon historical factors or is it driven by regularities, perhaps to do with universals of cognition and social interaction? We address these questions using the Google Books N-Gram Corpus as a source of data and relative entropy as a measure of changes in the frequency distributions of words. It turns out that there are both universals and historical contingencies at work. Across several languages, we observe similar rates of change, but only at timescales of at least around five decades. At shorter timescales, the rate of change is highly variable and differs between languages. Major societal transformations as well as catastrophic events such as wars lead to increased change in frequency distributions, whereas stability in society has a dampening effect on lexical evolution.
Collapse
Affiliation(s)
- V Bochkarev
- Kazan Federal University, Kremlevskaya Street 18, 420000 Kazan, Russia
| | - V Solovyev
- Kazan Federal University, Kremlevskaya Street 18, 420000 Kazan, Russia
| | - S Wichmann
- Kazan Federal University, Kremlevskaya Street 18, 420000 Kazan, Russia Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany
| |
Collapse
|
16
|
Pham T, Sheridan P, Shimodaira H. PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS One 2015; 10:e0137796. [PMID: 26378457 PMCID: PMC4574777 DOI: 10.1371/journal.pone.0137796] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 08/21/2015] [Indexed: 11/24/2022] Open
Abstract
Preferential attachment is a stochastic process that has been proposed to explain certain topological features characteristic of complex networks from diverse domains. The systematic investigation of preferential attachment is an important area of research in network science, not only for the theoretical matter of verifying whether this hypothesized process is operative in real-world networks, but also for the practical insights that follow from knowledge of its functional form. Here we describe a maximum likelihood based estimation method for the measurement of preferential attachment in temporal complex networks. We call the method PAFit, and implement it in an R package of the same name. PAFit constitutes an advance over previous methods primarily because we based it on a nonparametric statistical framework that enables attachment kernel estimation free of any assumptions about its functional form. We show this results in PAFit outperforming the popular methods of Jeong and Newman in Monte Carlo simulations. What is more, we found that the application of PAFit to a publically available Flickr social network dataset yielded clear evidence for a deviation of the attachment kernel from the popularly assumed log-linear form. Independent of our main work, we provide a correction to a consequential error in Newman’s original method which had evidently gone unnoticed since its publication over a decade ago.
Collapse
Affiliation(s)
- Thong Pham
- Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Osaka, Japan
- * E-mail:
| | - Paul Sheridan
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Hidetoshi Shimodaira
- Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Osaka, Japan
| |
Collapse
|
17
|
Hernández DG, Zanette DH, Samengo I. Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:022813. [PMID: 26382460 DOI: 10.1103/physreve.92.022813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Indexed: 06/05/2023]
Abstract
We develop the information-theoretical concepts required to study the statistical dependencies among three variables. Some of such dependencies are pure triple interactions, in the sense that they cannot be explained in terms of a combination of pairwise correlations. We derive bounds for triple dependencies, and characterize the shape of the joint probability distribution of three binary variables with high triple interaction. The analysis also allows us to quantify the amount of redundancy in the mutual information between pairs of variables, and to assess whether the information between two variables is or is not mediated by a third variable. These concepts are applied to the analysis of written texts. We find that the probability that a given word is found in a particular location within the text is not only modulated by the presence or absence of other nearby words, but also, on the presence or absence of nearby pairs of words. We identify the words enclosing the key semantic concepts of the text, the triplets of words with high pairwise and triple interactions, and the words that mediate the pairwise interactions between other words.
Collapse
Affiliation(s)
- Damián G Hernández
- Centro Atómico Bariloche and Instituto Balseiro, (8400) San Carlos de Bariloche, Argentina
| | - Damián H Zanette
- Centro Atómico Bariloche and Instituto Balseiro, (8400) San Carlos de Bariloche, Argentina
| | - Inés Samengo
- Centro Atómico Bariloche and Instituto Balseiro, (8400) San Carlos de Bariloche, Argentina
| |
Collapse
|
18
|
Mauch M, MacCallum RM, Levy M, Leroi AM. The evolution of popular music: USA 1960-2010. ROYAL SOCIETY OPEN SCIENCE 2015; 2:150081. [PMID: 26064663 PMCID: PMC4453253 DOI: 10.1098/rsos.150081] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 04/09/2015] [Indexed: 05/19/2023]
Abstract
In modern societies, cultural change seems ceaseless. The flux of fashion is especially obvious for popular music. While much has been written about the origin and evolution of pop, most claims about its history are anecdotal rather than scientific in nature. To rectify this, we investigate the US Billboard Hot 100 between 1960 and 2010. Using music information retrieval and text-mining tools, we analyse the musical properties of approximately 17 000 recordings that appeared in the charts and demonstrate quantitative trends in their harmonic and timbral properties. We then use these properties to produce an audio-based classification of musical styles and study the evolution of musical diversity and disparity, testing, and rejecting, several classical theories of cultural change. Finally, we investigate whether pop musical evolution has been gradual or punctuated. We show that, although pop music has evolved continuously, it did so with particular rapidity during three stylistic 'revolutions' around 1964, 1983 and 1991. We conclude by discussing how our study points the way to a quantitative science of cultural change.
Collapse
Affiliation(s)
- Matthias Mauch
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
| | | | - Mark Levy
- Last.fm, 5-11 Lavingdon Street, London SE1 0NZ, UK
| | - Armand M. Leroi
- Division of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
19
|
Cocho G, Flores J, Gershenson C, Pineda C, Sánchez S. Rank diversity of languages: generic behavior in computational linguistics. PLoS One 2015; 10:e0121898. [PMID: 25849150 PMCID: PMC4388647 DOI: 10.1371/journal.pone.0121898] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 02/05/2015] [Indexed: 11/19/2022] Open
Abstract
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: "heads" consist of words which almost do not change their rank in time, "bodies" are words of general use, while "tails" are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.
Collapse
Affiliation(s)
- Germinal Cocho
- Instituto de Física, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Jorge Flores
- Instituto de Física, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Carlos Gershenson
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico
- * E-mail:
| | - Carlos Pineda
- Instituto de Física, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Sergio Sánchez
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
20
|
Fushing H, Chen C, Hsieh YC, Farrell P. Lewis Carroll's Doublets net of English words: network heterogeneity in a complex system. PLoS One 2014; 9:e114177. [PMID: 25517974 PMCID: PMC4269387 DOI: 10.1371/journal.pone.0114177] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 10/30/2014] [Indexed: 11/19/2022] Open
Abstract
Lewis Carroll's English word game Doublets is represented as a system of networks with each node being an English word and each connectivity edge confirming that its two ending words are equal in letter length, but different by exactly one letter. We show that this system, which we call the Doublets net, constitutes a complex body of linguistic knowledge concerning English word structure that has computable multiscale features. Distributed morphological, phonological and orthographic constraints and the language's local redundancy are seen at the node level. Phonological communities are seen at the network level. And a balancing act between the language's global efficiency and redundancy is seen at the system level. We develop a new measure of intrinsic node-to-node distance and a computational algorithm, called community geometry, which reveal the implicit multiscale structure within binary networks. Because the Doublets net is a modular complex cognitive system, the community geometry and computable multi-scale structural information may provide a foundation for understanding computational learning in many systems whose network structure has yet to be fully analyzed.
Collapse
Affiliation(s)
- Hsieh Fushing
- Department of Statistics, University of California Davis, Davis, California, United States of America
| | - Chen Chen
- Department of Statistics, University of California Davis, Davis, California, United States of America
| | | | - Patrick Farrell
- Department of Linguistics, University of California Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Rêgo HHA, Braunstein LA, D′Agostino G, Stanley HE, Miyazima S. When a text is translated does the complexity of its vocabulary change? Translations and target readerships. PLoS One 2014; 9:e110213. [PMID: 25353343 PMCID: PMC4212908 DOI: 10.1371/journal.pone.0110213] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/18/2014] [Indexed: 11/19/2022] Open
Abstract
In linguistic studies, the academic level of the vocabulary in a text can be described in terms of statistical physics by using a "temperature" concept related to the text's word-frequency distribution. We propose a "comparative thermo-linguistic" technique to analyze the vocabulary of a text to determine its academic level and its target readership in any given language. We apply this technique to a large number of books by several authors and examine how the vocabulary of a text changes when it is translated from one language to another. Unlike the uniform results produced using the Zipf law, using our "word energy" distribution technique we find variations in the power-law behavior. We also examine some common features that span across languages and identify some intriguing questions concerning how to determine when a text is suitable for its intended readership.
Collapse
Affiliation(s)
- Hênio Henrique Aragão Rêgo
- Departamento de Física, Instituto Federal de Educação, Ciência e Tecnologia do Maranhão - IFMA, São Luís, Brazil
- Center for Polymer Studies, Boston University, Boston, Massachusetts, United States of America
| | - Lidia A. Braunstein
- Center for Polymer Studies, Boston University, Boston, Massachusetts, United States of America
- Departamento de Física, Facultad de Ciencias Exactas y Naturales, Instituto de Investigaciones Físicas de Mar del Plata (IFIMAR), Universidad Nacional de Mar del Plata-CONICET, Mar del Plata, Argentina
| | | | - H. Eugene Stanley
- Center for Polymer Studies, Boston University, Boston, Massachusetts, United States of America
| | - Sasuke Miyazima
- Center for Polymer Studies, Boston University, Boston, Massachusetts, United States of America
- Department of Natural Sciences, Chubu University, Kasugai, Aichi, Japan
| |
Collapse
|
22
|
Abstract
The Matthew effect describes the phenomenon that in societies, the rich tend to get richer and the potent even more powerful. It is closely related to the concept of preferential attachment in network science, where the more connected nodes are destined to acquire many more links in the future than the auxiliary nodes. Cumulative advantage and success-breads-success also both describe the fact that advantage tends to beget further advantage. The concept is behind the many power laws and scaling behaviour in empirical data, and it is at the heart of self-organization across social and natural sciences. Here, we review the methodology for measuring preferential attachment in empirical data, as well as the observations of the Matthew effect in patterns of scientific collaboration, socio-technical and biological networks, the propagation of citations, the emergence of scientific progress and impact, career longevity, the evolution of common English words and phrases, as well as in education and brain development. We also discuss whether the Matthew effect is due to chance or optimization, for example related to homophily in social systems or efficacy in technological systems, and we outline possible directions for future research.
Collapse
Affiliation(s)
- Matjaž Perc
- Faculty of Natural Sciences and Mathematics, University of Maribor, Koroška cesta 160, 2000 Maribor, Slovenia
| |
Collapse
|
23
|
Abstract
The quality of data plays an important role in business analysis and decision making, and data accuracy is an important aspect in data quality. Thus one necessary task for data quality management is to evaluate the accuracy of the data. And in order to solve the problem that the accuracy of the whole data set is low while a useful part may be high, it is also necessary to evaluate the accuracy of the query results, called relative accuracy. However, as far as we know, neither measure nor effective methods for the accuracy evaluation methods are proposed. Motivated by this, for relative accuracy evaluation, we propose a systematic method. We design a relative accuracy evaluation framework for relational databases based on a new metric to measure the accuracy using statistics. We apply the methods to evaluate the precision and recall of basic queries, which show the result's relative accuracy. We also propose the method to handle data update and to improve accuracy evaluation using functional dependencies. Extensive experimental results show the effectiveness and efficiency of our proposed framework and algorithms.
Collapse
Affiliation(s)
- Yan Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Hongzhi Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- * E-mail:
| | - Zhongsheng Yang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jianzhong Li
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
24
|
Internal and external dynamics in language: evidence from verb regularity in a historical corpus of English. PLoS One 2014; 9:e102882. [PMID: 25084006 PMCID: PMC4118841 DOI: 10.1371/journal.pone.0102882] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 06/24/2014] [Indexed: 11/19/2022] Open
Abstract
Human languages are rule governed, but almost invariably these rules have exceptions in the form of irregularities. Since rules in language are efficient and productive, the persistence of irregularity is an anomaly. How does irregularity linger in the face of internal (endogenous) and external (exogenous) pressures to conform to a rule? Here we address this problem by taking a detailed look at simple past tense verbs in the Corpus of Historical American English. The data show that the language is open, with many new verbs entering. At the same time, existing verbs might tend to regularize or irregularize as a consequence of internal dynamics, but overall, the amount of irregularity sustained by the language stays roughly constant over time. Despite continuous vocabulary growth, and presumably, an attendant increase in expressive power, there is no corresponding growth in irregularity. We analyze the set of irregulars, showing they may adhere to a set of minority rules, allowing for increased stability of irregularity over time. These findings contribute to the debate on how language systems become rule governed, and how and why they sustain exceptions to rules, providing insight into the interplay between the emergence and maintenance of rules and exceptions in language.
Collapse
|
25
|
Dyas-Correia S, Alexopoulos M. Text and Data Mining: Searching for Buried Treasures. SERIALS REVIEW 2014. [DOI: 10.1080/00987913.2014.950041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
26
|
Kondor D, Pósfai M, Csabai I, Vattay G. Do the rich get richer? An empirical analysis of the Bitcoin transaction network. PLoS One 2014; 9:e86197. [PMID: 24505257 PMCID: PMC3914786 DOI: 10.1371/journal.pone.0086197] [Citation(s) in RCA: 199] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 12/06/2013] [Indexed: 11/18/2022] Open
Abstract
The possibility to analyze everyday monetary transactions is limited by the scarcity of available data, as this kind of information is usually considered highly sensitive. Present econophysics models are usually employed on presumed random networks of interacting agents, and only some macroscopic properties (e.g. the resulting wealth distribution) are compared to real-world data. In this paper, we analyze Bitcoin, which is a novel digital currency system, where the complete list of transactions is publicly available. Using this dataset, we reconstruct the network of transactions and extract the time and amount of each payment. We analyze the structure of the transaction network by measuring network characteristics over time, such as the degree distribution, degree correlations and clustering. We find that linear preferential attachment drives the growth of the network. We also study the dynamics taking place on the transaction network, i.e. the flow of money. We measure temporal patterns and the wealth accumulation. Investigating the microscopic statistics of money movement, we find that sublinear preferential attachment governs the evolution of the wealth distribution. We report a scaling law between the degree and wealth associated to individual nodes.
Collapse
Affiliation(s)
- Dániel Kondor
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| | - Márton Pósfai
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
- Department of Theoretical Physics, Budapest University of Technology and Economics, Budapest, Hungary
| | - István Csabai
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| | - Gábor Vattay
- Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
| |
Collapse
|
27
|
A corpus of consonant-vowel-consonant real words and nonwords: comparison of phonotactic probability, neighborhood density, and consonant age of acquisition. Behav Res Methods 2014; 45:1159-67. [PMID: 23307574 DOI: 10.3758/s13428-012-0309-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A corpus of 5,765 consonant-vowel-consonant sequences (CVCs) was compiled, and phonotactic probability and neighborhood density were computed for both child and adult corpora. This corpus of CVCs, provided as supplementary materials, was analyzed to address the following questions: (1) Do computations based on a child corpus differ from those based on an adult corpus? (2) Do the phonotactic probability and/or the neighborhood density of real words differ from those of nonwords? (3) Do phonotactic probability and/or neighborhood density differ across CVCs that vary in consonant age of acquisition? The results showed significant differences in phonotactic probability and neighborhood density for the child versus adult corpora, replicating prior findings. The impact of this difference on future studies will depend on the level of precision needed when specifying probability and density. In addition, significant and large differences in phonotactic probability and neighborhood density were detected between real words and nonwords, which may present methodological challenges for future research. Finally, CVCs composed of earlier-acquired sounds differed significantly in probability and density from those composed of later-acquired sounds, although this effect was relatively small and is less likely to present significant methodological challenges to future studies.
Collapse
|
28
|
Abstract
For the 20th century since the Depression, we find a strong correlation between a ‘literary misery index’ derived from English language books and a moving average of the previous decade of the annual U.S. economic misery index, which is the sum of inflation and unemployment rates. We find a peak in the goodness of fit at 11 years for the moving average. The fit between the two misery indices holds when using different techniques to measure the literary misery index, and this fit is significantly better than other possible correlations with different emotion indices. To check the robustness of the results, we also analysed books written in German language and obtained very similar correlations with the German economic misery index. The results suggest that millions of books published every year average the authors' shared economic experiences over the past decade.
Collapse
|
29
|
Quantifying trading behavior in financial markets using Google Trends. Sci Rep 2013; 3:1684. [PMID: 23619126 PMCID: PMC3635219 DOI: 10.1038/srep01684] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 04/03/2013] [Indexed: 11/09/2022] Open
Abstract
Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect some of the complex human behavior that has led to these crises. We suggest that massive new data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. By analyzing changes in Google query volumes for search terms related to finance, we find patterns that may be interpreted as "early warning signs" of stock market moves. Our results illustrate the potential that combining extensive behavioral data sets offers for a better understanding of collective human behavior.
Collapse
|
30
|
Huang J. A common construction pattern of English words and Chinese characters. PLoS One 2013; 8:e74515. [PMID: 24023946 PMCID: PMC3759465 DOI: 10.1371/journal.pone.0074515] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2013] [Accepted: 08/05/2013] [Indexed: 12/04/2022] Open
Abstract
Rankings are ubiquitous around the world. Here I investigate spatial ranking patterns of English Words and Chinese Characters, and reveal a common construction pattern related to phase separation. In detail, I analyze a list of different words in the English language, and find that the frequency of the number of letters per word linearly or nonlinearly decays over its rank in the frequency table. I interpret the linearly decaying area as a linear phase that covers 96.4% words, which is in sharp contrast to a nonlinear phase (representing the nonlinearly decaying area) that covers the remaining 3.6% words. Amazingly, the phase separation phenomenon with the same two percentages of 96.4% and 3.6% holds also for the relation between strokes and characters in the Chinese language although English and Chinese are two distinctly different language systems. The common construction pattern originates from the log-normal distributions of frequencies of words or characters, which can be understood by the joint effect of both the Weber-Fechner law in psychophysics and the principle of maximum entropy in information theory.
Collapse
Affiliation(s)
- Jiping Huang
- Department of Physics and State Key Laboratory of Surface Physics, Fudan University, Shanghai, China
- * E-mail:
| |
Collapse
|
31
|
Efficient learning strategy of Chinese characters based on network approach. PLoS One 2013; 8:e69745. [PMID: 23990887 PMCID: PMC3749196 DOI: 10.1371/journal.pone.0069745] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Accepted: 06/12/2013] [Indexed: 11/19/2022] Open
Abstract
We develop an efficient learning strategy of Chinese characters based on the network of the hierarchical structural relations between Chinese characters. A more efficient strategy is that of learning the same number of useful Chinese characters in less effort or time. We construct a node-weighted network of Chinese characters, where character usage frequencies are used as node weights. Using this hierarchical node-weighted network, we propose a new learning method, the distributed node weight (DNW) strategy, which is based on a new measure of nodes' importance that considers both the weight of the nodes and its location in the network hierarchical structure. Chinese character learning strategies, particularly their learning order, are analyzed as dynamical processes over the network. We compare the efficiency of three theoretical learning methods and two commonly used methods from mainstream Chinese textbooks, one for Chinese elementary school students and the other for students learning Chinese as a second language. We find that the DNW method significantly outperforms the others, implying that the efficiency of current learning methods of major textbooks can be greatly improved.
Collapse
|
32
|
Moat HS, Curme C, Avakian A, Kenett DY, Stanley HE, Preis T. Quantifying Wikipedia Usage Patterns Before Stock Market Moves. Sci Rep 2013. [PMCID: PMC3647164 DOI: 10.1038/srep01801] [Citation(s) in RCA: 173] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
33
|
Abstract
We make use of information provided in the titles and abstracts of over half a million publications that were published by the American Physical Society during the past 119 years. By identifying all unique words and phrases and determining their monthly usage patterns, we obtain quantifiable insights into the trends of physics discovery from the end of the 19th century to today. We show that the magnitudes of upward and downward trends yield heavy-tailed distributions, and that their emergence is due to the Matthew effect. This indicates that both the rise and fall of scientific paradigms is driven by robust principles of self-organization. Data also confirm that periods of war decelerate scientific progress, and that the later is very much subject to globalisation.
Collapse
|
34
|
Abstract
Recent studies of urban scaling show that important socioeconomic city characteristics such as wealth and innovation capacity exhibit a nonlinear, particularly a power law scaling with population size. These nonlinear effects are common to all cities, with similar power law exponents. These findings mean that the larger the city, the more disproportionally they are places of wealth and innovation. Local properties of cities cause a deviation from the expected behavior as predicted by the power law scaling. In this paper we demonstrate that universities show a similar behavior as cities in the distribution of the ‘gross university income’ in terms of total number of citations over ‘size’ in terms of total number of publications. Moreover, the power law exponents for university scaling are comparable to those for urban scaling. We find that deviations from the expected behavior can indeed be explained by specific local properties of universities, particularly the field-specific composition of a university, and its quality in terms of field-normalized citation impact. By studying both the set of the 500 largest universities worldwide and a specific subset of these 500 universities -the top-100 European universities- we are also able to distinguish between properties of universities with as well as without selection of one specific local property, the quality of a university in terms of its average field-normalized citation impact. It also reveals an interesting observation concerning the working of a crucial property in networked systems, preferential attachment.
Collapse
Affiliation(s)
- Anthony F J van Raan
- Centre for Science and Technology Studies (CWTS), Leiden University, Leiden, The Netherlands.
| |
Collapse
|
35
|
Petersen AM, Tenenbaum JN, Havlin S, Stanley HE, Perc M. Languages cool as they expand: allometric scaling and the decreasing need for new words. Sci Rep 2012; 2:943. [PMID: 23230508 PMCID: PMC3517984 DOI: 10.1038/srep00943] [Citation(s) in RCA: 142] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 10/24/2012] [Indexed: 11/23/2022] Open
Abstract
We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This "cooling pattern" forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.
Collapse
Affiliation(s)
- Alexander M. Petersen
- Laboratory for the Analysis of Complex Economic Systems, IMT Lucca Institute for Advanced Studies, Lucca 55100, Italy
| | - Joel N. Tenenbaum
- Center for Polymer Studies and Department of Physics, Boston University, Boston, Massachusetts 02215, USA
- Operations and Technology Management, School of Management, Boston University, Boston, Massachusetts 02215, USA
| | - Shlomo Havlin
- Minerva Center and Department of Physics, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - H. Eugene Stanley
- Center for Polymer Studies and Department of Physics, Boston University, Boston, Massachusetts 02215, USA
| | - Matjaž Perc
- Department of Physics, Faculty of Natural Sciences and Mathematics, University of Maribor, Koroška cesta 160, SI-2000 Maribor, Slovenia
| |
Collapse
|