1
|
Monti C, Pangallo M, De Francisci Morales G, Bonchi F. On learning agent-based models from data. Sci Rep 2023; 13:9268. [PMID: 37286576 DOI: 10.1038/s41598-023-35536-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/19/2023] [Indexed: 06/09/2023] Open
Abstract
Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, a significant drawback of ABMs is their inability to estimate agent-specific (or "micro") variables, which hinders their ability to make accurate predictions using micro-level data. In this paper, we propose a protocol to learn the latent micro-variables of an ABM from data. We begin by translating an ABM into a probabilistic model characterized by a computationally tractable likelihood. Next, we use a gradient-based expectation maximization algorithm to maximize the likelihood of the latent variables. We showcase the efficacy of our protocol on an ABM of the housing market, where agents with different incomes bid higher prices to live in high-income neighborhoods. Our protocol produces accurate estimates of the latent variables while preserving the general behavior of the ABM. Moreover, our estimates substantially improve the out-of-sample forecasting capabilities of the ABM compared to simpler heuristics. Our protocol encourages modelers to articulate assumptions, consider the inferential process, and spot potential identification problems, thus making it a useful alternative to black-box data assimilation methods.
Collapse
|
2
|
Preti G, Morales GDF, Riondato M. MaNIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling. ACM T INTEL SYST TEC 2023. [DOI: 10.1145/3587254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
We present MaNIACS, a sampling-based randomized algorithm for computing high-quality approximations of the collection of the subgraph patterns that are frequent in a single, large, vertex-labeled graph, according to the Minimum Node Image-based (MNI) frequency measure. The output of MaNIACS comes with strong probabilistic guarantees, obtained by using the empirical Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory, together with strong probabilistic tail bounds on the difference between the frequency of a pattern in the sample and its exact frequency. MaNIACS leverages properties of the MNI-frequency to aggressively prune the pattern search space, and thus to reduce the time spent in exploring subspaces that contain no frequent patterns. In turn, this pruning leads to better bounds to the maximum frequency estimation error, which leads to increased pruning, resulting in a beneficial feedback effect. The results of our experimental evaluation of MaNIACS on real graphs show that it returns high-quality collections of frequent patterns in large graphs up to two orders of magnitude faster than the exact algorithm.
Collapse
Affiliation(s)
| | | | - Matteo Riondato
- Assistant Professor Department of Computer Science Amherst College, USA
| |
Collapse
|
3
|
Monti C, Aiello LM, De Francisci Morales G, Bonchi F. The language of opinion change on social media under the lens of communicative action. Sci Rep 2022; 12:17920. [PMID: 36289251 PMCID: PMC9605949 DOI: 10.1038/s41598-022-21720-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/30/2022] [Indexed: 01/20/2023] Open
Abstract
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dimensions of a message, namely archetypes of social intent of language, that come from social exchange theory. We identify key ingredients to opinion change by looking at more than 46k posts and more than 3.5M comments on Reddit's r/ChangeMyView, a debate forum where people try to change each other's opinion and explicitly mark opinion-changing comments with a special flag called delta. Comments that express no intent are about 77% less likely to change the mind of the recipient, compared to comments that convey at least one social dimension. Among the various social dimensions, the ones that are most likely to produce an opinion change are knowledge, similarity, and trust, which resonates with Habermas' theory of communicative action. We also find other new important dimensions, such as appeals to power or empathetic expressions of support. Finally, in line with theories of constructive conflict, yet contrary to the popular characterization of conflict as the bane of modern social media, our findings show that voicing conflict in the context of a structured public debate can promote integration, especially when it is used to counter another conflictive stance. By leveraging recent advances in natural language processing, our work provides an empirical framework for Habermas' theory, finds concrete examples of its effects in the wild, and suggests its possible extension with a more faceted understanding of intent interpreted as social dimensions of language.
Collapse
Affiliation(s)
| | - Luca Maria Aiello
- grid.32190.390000 0004 0620 5453IT University of Copenhagen, Copenhagen, Denmark ,Pioneer Centre for AI, Copenhagen, Denmark
| | | | | |
Collapse
|
4
|
Lucchini L, Aiello LM, Alessandretti L, De Francisci Morales G, Starnini M, Baronchelli A. From Reddit to Wall Street: the role of committed minorities in financial collective action. R Soc Open Sci 2022; 9:211488. [PMID: 35425623 PMCID: PMC8984357 DOI: 10.1098/rsos.211488] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 02/22/2022] [Indexed: 05/03/2023]
Abstract
In January 2021, retail investors coordinated on Reddit to target short-selling activity by hedge funds on GameStop shares, causing a surge in the share price and triggering significant losses for the funds involved. Such an effective collective action was unprecedented in finance, and its dynamics remain unclear. Here, we analyse Reddit and financial data and rationalize the events based on recent findings describing how a small fraction of committed individuals may trigger behavioural cascades. First, we operationalize the concept of individual commitment in financial discussions. Second, we show that the increase of commitment within Reddit pre-dated the initial surge in price. Third, we reveal that initial committed users occupied a central position in the network of Reddit conversations. Finally, we show that the social identity of the broader Reddit community grew as the collective action unfolded. These findings shed light on financial collective action, as several observers anticipate it will grow in importance.
Collapse
Affiliation(s)
- Lorenzo Lucchini
- Bocconi University, Milano 20100, Italy
- FBK—Fondazione Bruno Kessler, Trento 38123, Italy
| | | | | | | | | | - Andrea Baronchelli
- Department of Mathematics, City University of London, London EC1V 0HB, UK
- The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
- UCL Centre for Blockchain Technologies, University College London, London, UK
| |
Collapse
|
5
|
Berloco C, De Francisci Morales G, Frassineti D, Greco G, Kumarasinghe H, Lamieri M, Massaro E, Miola A, Yang S. Predicting corporate credit risk: Network contagion via trade credit. PLoS One 2021; 16:e0250115. [PMID: 33914764 PMCID: PMC8084139 DOI: 10.1371/journal.pone.0250115] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 03/30/2021] [Indexed: 11/18/2022] Open
Abstract
Trade credit is a payment extension granted by a selling firm to its customer. Companies typically respond to late payments from their customers by delaying payments to suppliers, thus generating a ripple through the transaction network. Therefore, trade credit is as a potential vehicle of propagation of losses in case of default events. The goal of this work is to leverage information on the trade credit among connected firms to predict imminent defaults of firms. We use a unique dataset of client firms of a major Italian bank to investigate firm bankruptcy between October 2016 to March 2018. We develop a model to capture network spillover effects originating from the supply chain on the probability of default of each firm via a sequential approach: the output of a first model component on single firm features is used in a subsequent model which captures network spillovers. While the first component is the standard econometrics way to predict such dynamics, the network module represents an innovative way to look into the effect of trade credit on default probability. This module looks at the transaction network of the firm, as inferred from the payments transiting via the bank, in order to identify the trade partners of the firm. By using several features extracted from the network of transactions, this model is able to predict a large fraction of the defaults, thus showing the value hidden in the network information. Finally, we merge firm and network features with a machine learning model to create a ‘hybrid’ model, which improves the recall for the task by almost 20 percentage points over the baseline.
Collapse
Affiliation(s)
- Claudia Berloco
- Intesa Sanpaolo, Torino, Italy
- Università degli Studi di Torino, Torino, Italy
- * E-mail: (CB); (GDFM); (ML)
| | | | | | | | | | - Marco Lamieri
- Intesa Sanpaolo, Torino, Italy
- * E-mail: (CB); (GDFM); (ML)
| | | | | | - Shuyi Yang
- Intesa Sanpaolo, Torino, Italy
- Università degli Studi di Torino, Torino, Italy
| |
Collapse
|
6
|
Betti L, De Francisci Morales G, Gauvin L, Kalimeri K, Mejova Y, Paolotti D, Starnini M. Detecting adherence to the recommended childhood vaccination schedule from user-generated content in a US parenting forum. PLoS Comput Biol 2021; 17:e1008919. [PMID: 33901170 PMCID: PMC8075195 DOI: 10.1371/journal.pcbi.1008919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 03/26/2021] [Indexed: 12/03/2022] Open
Abstract
Vaccine hesitancy is considered as one of the leading causes for the resurgence of vaccine preventable diseases. A non-negligible minority of parents does not fully adhere to the recommended vaccination schedule, leading their children to be partially immunized and at higher risk of contracting vaccine preventable diseases. Here, we leverage more than one million comments of 201,986 users posted from March 2008 to April 2019 on the public online forum BabyCenter US to learn more about such parents. For 32% with geographic location, we find the number of mapped users for each US state resembling the census population distribution with good agreement. We employ Natural Language Processing to identify 6884 and 10,131 users expressing their intention of following the recommended and alternative vaccination schedule, respectively RSUs and ASUs. From the analysis of their activity on the forum we find that ASUs have distinctly different interests and previous experiences with vaccination than RSUs. In particular, ASUs are more likely to follow groups focused on alternative medicine, are two times more likely to have experienced adverse events following immunization, and to mention more serious adverse reactions such as seizure or developmental regression. Content analysis of comments shows that the resources most frequently shared by both groups point to governmental domains (.gov). Finally, network analysis shows that RSUs and ASUs communicate between each other (indicating the absence of echo chambers), however with the latter group being more endogamic and favoring interactions with other ASUs. While our findings are limited to the specific platform analyzed, our approach may provide additional insights for the development of campaigns targeting parents on digital platforms. The importance and effectiveness of vaccines is generally high, but concerns toward vaccination contribute to eroding confidence in vaccination. Recently, alternative vaccination schedules are becoming popular as they allow parents to selectively delay or refuse certain vaccines depending on their specific concerns. Not being expressly anti-vaccination, these parents are challenging to identify on social media, however understanding the determinants of their hesitancy toward vaccines could help addressing parents’ concerns through targeted interventions. In this work, we create a Natural Language Processing pipeline to automatically identify parents who state their adherence to the recommended or alternative vaccination schedule on a popular parenting forum, BabyCenter US. We find that these users have distinct interests and different experiences with vaccination, although they frequently share similar sources of information (e.g., .gov websites). Differently from what is observed on most popular digital platforms like Facebook or Twitter, where users communicate mainly with like-minded users, Babycenter users communicate between each other independently of the vaccination schedule they adopt. These observations suggest that parenting fora may be a more suitable medium to develop intervention aiming to influence positively the vaccination behavior of parents.
Collapse
|
7
|
Abstract
Social media may limit the exposure to diverse perspectives and favor the formation of groups of like-minded users framing and reinforcing a shared narrative, that is, echo chambers. However, the interaction paradigms among users and feed algorithms greatly vary across social media platforms. This paper explores the key differences between the main social media platforms and how they are likely to influence information spreading and echo chambers' formation. We perform a comparative analysis of more than 100 million pieces of content concerning several controversial topics (e.g., gun control, vaccination, abortion) from Gab, Facebook, Reddit, and Twitter. We quantify echo chambers over social media by two main ingredients: 1) homophily in the interaction networks and 2) bias in the information diffusion toward like-minded peers. Our results show that the aggregation of users in homophilic clusters dominate online interactions on Facebook and Twitter. We conclude the paper by directly comparing news consumption on Facebook and Reddit, finding higher segregation on Facebook.
Collapse
Affiliation(s)
- Matteo Cinelli
- Department of Environmental Sciences, Informatics and Statistics, Ca'Foscari Univerity of Venice, 30172 Venice, Italy
| | | | - Alessandro Galeazzi
- Department of Information Engineering, University of Brescia, 25123 Brescia, Italy
| | | | - Michele Starnini
- Institute for Scientific Interchange (ISI) Foundation, 10126 Torino, Italy
| |
Collapse
|
8
|
Abstract
Echo chambers in online social networks, whereby users' beliefs are reinforced by interactions with like-minded peers and insulation from others' points of view, have been decried as a cause of political polarization. Here, we investigate their role in the debate around the 2016 US elections on Reddit, a fundamental platform for the success of Donald Trump. We identify Trump vs Clinton supporters and reconstruct their political interaction network. We observe a preference for cross-cutting political interactions between the two communities rather than within-group interactions, thus contradicting the echo chamber narrative. Furthermore, these interactions are asymmetrical: Clinton supporters are particularly eager to answer comments by Trump supporters. Beside asymmetric heterophily, users show assortative behavior for activity, and disassortative, asymmetric behavior for popularity. Our findings are tested against a null model of random interactions, by using two different approaches: a network rewiring which preserves the activity of nodes, and a logit regression which takes into account possible confounding factors. Finally, we explore possible socio-demographic implications. Users show a tendency for geographical homophily and a small positive correlation between cross-interactions and voter abstention. Our findings shed light on public opinion formation on social media, calling for a better understanding of the social dynamics at play in this context.
Collapse
|
9
|
Abstract
Which topics spark the most heated debates on social media? Identifying those topics is not only interesting from a societal point of view but also allows the filtering and aggregation of social media content for disseminating news stories. In this article, we perform a systematic methodological study of controversy detection by using the content and the network structure of social media.
Unlike previous work, rather than studying controversy in a single hand-picked topic and using domain-specific knowledge, we take a general approach to study topics
in any domain
. Our approach to quantifying controversy is based on a graph-based three-stage pipeline, which involves (i) building a
conversation graph
about a topic, (ii) partitioning the conversation graph to identify potential sides of the controversy, and (iii) measuring the amount of controversy from characteristics of the graph.
We perform an extensive comparison of controversy measures, different graph-building approaches, and data sources. We use both controversial and non-controversial topics on Twitter, as well as other external datasets. We find that our new random-walk-based measure outperforms existing ones in capturing the intuitive notion of controversy and show that content features are vastly less helpful in this task.
Collapse
|
10
|
|