1
|
Rossetti CSL, Hauser OP, Hilbe C. Dynamics of cooperation in concurrent games. Nat Commun 2025; 16:1524. [PMID: 39934104 DOI: 10.1038/s41467-025-56083-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 01/04/2025] [Indexed: 02/13/2025] Open
Abstract
People frequently encounter situations where individually optimal decisions conflict with group interests. To navigate such social dilemmas, they often employ simple heuristics based on direct reciprocity: cooperate when others do and cease cooperation when partners defect. However, prior research typically assumes that individuals only interact in one game at a time. In reality, people engage in multiple games concurrently, and the outcome of one interaction can influence behavior in another. Here, we introduce a theoretical framework to study the resulting cross-over and spill-over effects. Participants repeatedly engage in two independent stage games, either with the same or different partners, adapting their strategies over time through an evolutionary learning process. Our findings indicate that individuals often link their behavior across games, particularly under cognitive constraints like imperfect recall. A behavioral experiment with 316 UK-based students suggests that concurrent games negatively affect cooperation, highlighting how strategic motives and spillovers impact reciprocity.
Collapse
Affiliation(s)
- Charlotte S L Rossetti
- Max Planck Research Group on the Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.
- Department of Psychology, University of Zürich, 8050, Zürich, Switzerland.
| | - Oliver P Hauser
- Department of Economics, University of Exeter, Exeter, EX4 4PU, UK
| | - Christian Hilbe
- Max Planck Research Group on the Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany.
| |
Collapse
|
2
|
Meylahn BV, Meylahn JM. How social reinforcement learning can lead to metastable polarisation and the voter model. PLoS One 2024; 19:e0313951. [PMID: 39689073 DOI: 10.1371/journal.pone.0313951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 11/02/2024] [Indexed: 12/19/2024] Open
Abstract
Previous explanations for the persistence of polarization of opinions have typically included modelling assumptions that predispose the possibility of polarization (i.e., assumptions allowing a pair of agents to drift apart in their opinion such as repulsive interactions or bounded confidence). An exception is a recent simulation study showing that polarization is persistent when agents form their opinions using social reinforcement learning. Our goal is to highlight the usefulness of reinforcement learning in the context of modeling opinion dynamics, but that caution is required when selecting the tools used to study such a model. We show that the polarization observed in the model of the simulation study cannot persist indefinitely, and exhibits consensus asymptotically with probability one. By constructing a link between the reinforcement learning model and the voter model, we argue that the observed polarization is metastable. Finally, we show that a slight modification in the learning process of the agents changes the model from being non-ergodic to being ergodic. Our results show that reinforcement learning may be a powerful method for modelling polarization in opinion dynamics, but that the tools (objects to study such as the stationary distribution, or time to absorption for example) appropriate for analysing such models crucially depend on their properties (such as ergodicity, or transience). These properties are determined by the details of the learning process and may be difficult to identify based solely on simulations.
Collapse
Affiliation(s)
- Benedikt V Meylahn
- Korteweg-de Vries Institue for Mathematics, University of Amsterdam, Amsterdam, The Netherlands
| | - Janusz M Meylahn
- Department of Applied Mathematics, University of Twente, Enschede, The Netherlands
| |
Collapse
|
3
|
Bergerot C, Barfuss W, Romanczuk P. Moderate confirmation bias enhances decision-making in groups of reinforcement-learning agents. PLoS Comput Biol 2024; 20:e1012404. [PMID: 39231162 PMCID: PMC11404843 DOI: 10.1371/journal.pcbi.1012404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 09/16/2024] [Accepted: 08/09/2024] [Indexed: 09/06/2024] Open
Abstract
Humans tend to give more weight to information confirming their beliefs than to information that disconfirms them. Nevertheless, this apparent irrationality has been shown to improve individual decision-making under uncertainty. However, little is known about this bias' impact on decision-making in a social context. Here, we investigate the conditions under which confirmation bias is beneficial or detrimental to decision-making under social influence. To do so, we develop a Collective Asymmetric Reinforcement Learning (CARL) model in which artificial agents observe others' actions and rewards, and update this information asymmetrically. We use agent-based simulations to study how confirmation bias affects collective performance on a two-armed bandit task, and how resource scarcity, group size and bias strength modulate this effect. We find that a confirmation bias benefits group learning across a wide range of resource-scarcity conditions. Moreover, we discover that, past a critical bias strength, resource abundance favors the emergence of two different performance regimes, one of which is suboptimal. In addition, we find that this regime bifurcation comes with polarization in small groups of agents. Overall, our results suggest the existence of an optimal, moderate level of confirmation bias for decision-making in a social context.
Collapse
Affiliation(s)
- Clémence Bergerot
- Department of Biology, Humboldt Universität zu Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany
| | - Wolfram Barfuss
- Transdisciplinary Research Area: Sustainable Futures, University of Bonn, Bonn, Germany
- Center for Development Research (ZEF), University of Bonn, Bonn, Germany
| | - Pawel Romanczuk
- Department of Biology, Humboldt Universität zu Berlin, Berlin, Germany
- Science of Intelligence, Research Cluster of Excellence, Berlin, Germany
| |
Collapse
|
4
|
Zhang Z, Jiang X, Xia C. STP-based control of networked evolutionary games with multi-channel structure. CHAOS (WOODBURY, N.Y.) 2024; 34:093112. [PMID: 39236108 DOI: 10.1063/5.0223029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 08/11/2024] [Indexed: 09/07/2024]
Abstract
The channel delay in the game process has an important influence on its evolutionary dynamics. This paper aims to optimize the strategy game with general information delays, including the state delay in the previous work, and the control delay that is introduced for the first time to depict the time consumed by strategy propagation in reality. Specifically, the dynamics of networked evolutionary games is transformed into an algebraic form by use of the newly proposed semi-tensor product of matrices, which extends the ordinary matrix multiplication. Subsequently, according to the values of control and state delays, the strategy optimization problem can be divided into six different cases, and then via the constructed algebraic equation, we can obtain the sufficient and necessary conditions for the existence of the strategy optimization. Meanwhile, based on a reachable set method, the corresponding feedback controllers are further designed. Last, one illustrative example is taken to demonstrate the feasibility of our model. The results of this paper will be helpful to investigate the game-based control issues in the complex networked environment.
Collapse
Affiliation(s)
- Zhipeng Zhang
- School of Artificial Intelligence, Tiangong University, Tianjin 300387, People's Republic of China
| | - Xiaotong Jiang
- School of Artificial Intelligence, Tiangong University, Tianjin 300387, People's Republic of China
| | - Chengyi Xia
- School of Artificial Intelligence, Tiangong University, Tianjin 300387, People's Republic of China
| |
Collapse
|
5
|
Khajuria R, Sarwar A. Review of reinforcement learning applications in segmentation, chemotherapy, and radiotherapy of cancer. Micron 2024; 178:103583. [PMID: 38185018 DOI: 10.1016/j.micron.2023.103583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/16/2023] [Accepted: 12/20/2023] [Indexed: 01/09/2024]
Abstract
Owing to early diagnosis and treatment of cancer as a prerequisite in recent times, the role of machine learning has been increased substantially. The mathematically powerful and optimized solutions for the detection and cure of cancer are constantly being explored and novel models based upon standard algorithms are also being developed. Leveraging one such solution is Reinforcement Learning (RL), which is a semi-supervised type of learning. The paper presents a detailed discussion on the various RL techniques, algorithms, and open issues, in addition to the review of literature for diagnosis and treatment of cancer. A smaller number of publications for diagnosis and treatment of cancer have been reported before 2011 but now after the success of Deep Learning (DL) and the advent of Deep Reinforcement Learning (DRL), the publications have grown in number from 2017 onwards. The scope of RL for cancer diagnosis and treatment is also demystified and provides the research community with the insights of how to formulate RL problem as a Cancer diagnostic problem. RL has been found successful for landmark detection in medical images and optimal control of drugs and radiations.
Collapse
|
6
|
Stetsenko A, Koos T. Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system. Proc Natl Acad Sci U S A 2023; 120:e2309015120. [PMID: 37903252 PMCID: PMC10636325 DOI: 10.1073/pnas.2309015120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 09/29/2023] [Indexed: 11/01/2023] Open
Abstract
The temporal difference learning (TDL) algorithm has been essential to conceptualizing the role of dopamine in reinforcement learning (RL). Despite its theoretical importance, it remains unknown whether a neuronal implementation of this algorithm exists in the brain. Here, we provide an interpretation of the recently described signaling properties of ventral tegmental area (VTA) GABAergic neurons and show that a circuitry of these neurons implements the TDL algorithm. Specifically, we identified the neuronal mechanism of three key components of the TDL model: a sustained state value signal encoded by an afferent input to the VTA, a temporal differentiation circuit formed by two types of VTA GABAergic neurons the combined output of which computes momentary reward prediction (RP) as the derivative of the state value, and the computation of reward prediction errors (RPEs) in dopamine neurons utilizing the output of the differentiation circuit. Using computational methods, we also show that this mechanism is optimally adapted to the biophysics of RPE signaling in dopamine neurons, mechanistically links the emergence of conditioned reinforcement to RP, and can naturally account for the temporal discounting of reinforcement. Elucidating the implementation of the TDL algorithm may further the investigation of RL in biological and artificial systems.
Collapse
Affiliation(s)
- Anya Stetsenko
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ07102
| | - Tibor Koos
- Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ07102
| |
Collapse
|
7
|
Kleshnina M, Hilbe C, Šimsa Š, Chatterjee K, Nowak MA. The effect of environmental information on evolution of cooperation in stochastic games. Nat Commun 2023; 14:4153. [PMID: 37438341 PMCID: PMC10338504 DOI: 10.1038/s41467-023-39625-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 06/22/2023] [Indexed: 07/14/2023] Open
Abstract
Many human interactions feature the characteristics of social dilemmas where individual actions have consequences for the group and the environment. The feedback between behavior and environment can be studied with the framework of stochastic games. In stochastic games, the state of the environment can change, depending on the choices made by group members. Past work suggests that such feedback can reinforce cooperative behaviors. In particular, cooperation can evolve in stochastic games even if it is infeasible in each separate repeated game. In stochastic games, participants have an interest in conditioning their strategies on the state of the environment. Yet in many applications, precise information about the state could be scarce. Here, we study how the availability of information (or lack thereof) shapes evolution of cooperation. Already for simple examples of two state games we find surprising effects. In some cases, cooperation is only possible if there is precise information about the state of the environment. In other cases, cooperation is most abundant when there is no information about the state of the environment. We systematically analyze all stochastic games of a given complexity class, to determine when receiving information about the environment is better, neutral, or worse for evolution of cooperation.
Collapse
Affiliation(s)
| | - Christian Hilbe
- Max Planck Research Group Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Štěpán Šimsa
- IST Austria, Klosterneuburg, Austria
- Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | | | - Martin A Nowak
- Department of Mathematics, Harvard University, Cambridge, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
8
|
Sawicki J, Berner R, Loos SAM, Anvari M, Bader R, Barfuss W, Botta N, Brede N, Franović I, Gauthier DJ, Goldt S, Hajizadeh A, Hövel P, Karin O, Lorenz-Spreen P, Miehl C, Mölter J, Olmi S, Schöll E, Seif A, Tass PA, Volpe G, Yanchuk S, Kurths J. Perspectives on adaptive dynamical systems. CHAOS (WOODBURY, N.Y.) 2023; 33:071501. [PMID: 37486668 DOI: 10.1063/5.0147231] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 05/24/2023] [Indexed: 07/25/2023]
Abstract
Adaptivity is a dynamical feature that is omnipresent in nature, socio-economics, and technology. For example, adaptive couplings appear in various real-world systems, such as the power grid, social, and neural networks, and they form the backbone of closed-loop control strategies and machine learning algorithms. In this article, we provide an interdisciplinary perspective on adaptive systems. We reflect on the notion and terminology of adaptivity in different disciplines and discuss which role adaptivity plays for various fields. We highlight common open challenges and give perspectives on future research directions, looking to inspire interdisciplinary approaches.
Collapse
Affiliation(s)
- Jakub Sawicki
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Akademie Basel, Fachhochschule Nordwestschweiz FHNW, Leonhardsstrasse 6, 4009 Basel, Switzerland
| | - Rico Berner
- Department of Physics, Humboldt-Universität zu Berlin, Newtonstraße 15, 12489 Berlin, Germany
| | - Sarah A M Loos
- DAMTP, University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, United Kingdom
| | - Mehrnaz Anvari
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, 53757 Sankt-Augustin, Germany
| | - Rolf Bader
- Institute of Systematic Musicology, University of Hamburg, Hamburg, Germany
| | - Wolfram Barfuss
- Transdisciplinary Research Area: Sustainable Futures, University of Bonn, 53113 Bonn, Germany
- Center for Development Research (ZEF), University of Bonn, 53113 Bonn, Germany
| | - Nicola Botta
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Nuria Brede
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Department of Computer Science, University of Potsdam, An der Bahn 2, 14476 Potsdam, Germany
| | - Igor Franović
- Scientific Computing Laboratory, Center for the Study of Complex Systems, Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11080 Belgrade, Serbia
| | - Daniel J Gauthier
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
| | - Sebastian Goldt
- Department of Physics, International School of Advanced Studies (SISSA), Trieste, Italy
| | - Aida Hajizadeh
- Research Group Comparative Neuroscience, Leibniz Institute for Neurobiology, Magdeburg, Germany
| | - Philipp Hövel
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
| | - Omer Karin
- Department of Mathematics, Imperial College London, London SW7 2AZ, United Kingdom
| | - Philipp Lorenz-Spreen
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Lentzeallee 94, 14195 Berlin, Germany
| | - Christoph Miehl
- Akademie Basel, Fachhochschule Nordwestschweiz FHNW, Leonhardsstrasse 6, 4009 Basel, Switzerland
| | - Jan Mölter
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Boltzmannstraße 3, 85748 Garching bei München, Germany
| | - Simona Olmi
- Akademie Basel, Fachhochschule Nordwestschweiz FHNW, Leonhardsstrasse 6, 4009 Basel, Switzerland
| | - Eckehard Schöll
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Akademie Basel, Fachhochschule Nordwestschweiz FHNW, Leonhardsstrasse 6, 4009 Basel, Switzerland
| | - Alireza Seif
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, USA
| | - Peter A Tass
- Department of Neurosurgery, Stanford University School of Medicine, Stanford, California 94304, USA
| | - Giovanni Volpe
- Department of Physics, University of Gothenburg, Gothenburg, Sweden
| | - Serhiy Yanchuk
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Department of Physics, Humboldt-Universität zu Berlin, Newtonstraße 15, 12489 Berlin, Germany
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research, Telegrafenberg, 14473 Potsdam, Germany
- Department of Physics, Humboldt-Universität zu Berlin, Newtonstraße 15, 12489 Berlin, Germany
| |
Collapse
|
9
|
Barfuss W, Meylahn JM. Intrinsic fluctuations of reinforcement learning promote cooperation. Sci Rep 2023; 13:1309. [PMID: 36693872 PMCID: PMC9873645 DOI: 10.1038/s41598-023-27672-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 01/05/2023] [Indexed: 01/26/2023] Open
Abstract
In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with [Formula: see text]-greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner's dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.
Collapse
Affiliation(s)
- Wolfram Barfuss
- Tübingen AI Center, University of Tübingen, Tübingen, Germany
| | - Janusz M Meylahn
- Department of Applied Mathematics, University of Twente, Enschede, The Netherlands. .,Dutch Institute of Emergent Phenomena, University of Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
10
|
Barfuss W, Mann RP. Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability. Phys Rev E 2022; 105:034409. [PMID: 35428165 DOI: 10.1103/physreve.105.034409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 02/24/2022] [Indexed: 11/07/2022]
Abstract
Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios, from navigation and foraging behavior to the provision of renewable resources and public infrastructures. Yet previous modeling work on agent learning and decision-making either lacks a systematic way to describe this source of uncertainty or puts the focus on obtaining optimal policies using complex models of the world that would impose an unrealistically high cognitive demand on real agents. In this work we aim to efficiently describe the emergent behavior of biologically plausible and parsimonious learning agents faced with partially observable worlds. Therefore we derive and present deterministic reinforcement learning dynamics where the agents observe the true state of the environment only partially. We showcase the broad applicability of our dynamics across different classes of partially observable agent-environment systems. We find that partial observability creates unintuitive benefits in several specific contexts, pointing the way to further research on a general understanding of such effects. For instance, partially observant agents can learn better outcomes faster, in a more stable way, and even overcome social dilemmas. Furthermore, our method allows the application of dynamical systems theory to partially observable multiagent leaning. In this regard we find the emergence of catastrophic limit cycles, a critical slowing down of the learning processes between reward regimes, and the separation of the learning dynamics into fast and slow directions, all caused by partial observability. Therefore, the presented dynamics have the potential to become a formal, yet practical, lightweight and robust tool for researchers in biology, social science, and machine learning to systematically investigate the effects of interacting partially observant agents.
Collapse
Affiliation(s)
- Wolfram Barfuss
- Institute for Theoretical Physics, University of Tübingen, 72076 Tübingen, Germany.,Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Richard P Mann
- Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, United Kingdom
| |
Collapse
|
11
|
Barfuss W. Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics. Neural Comput Appl 2022; 34:1653-1671. [PMID: 35221541 PMCID: PMC8827307 DOI: 10.1007/s00521-021-06117-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/11/2021] [Indexed: 01/02/2023]
Abstract
A dynamical systems perspective on multi-agent learning, based on the link between evolutionary game theory and reinforcement learning, provides an improved, qualitative understanding of the emerging collective learning dynamics. However, confusion exists with respect to how this dynamical systems account of multi-agent learning should be interpreted. In this article, I propose to embed the dynamical systems description of multi-agent learning into different abstraction levels of cognitive analysis. The purpose of this work is to make the connections between these levels explicit in order to gain improved insight into multi-agent learning. I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning. I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty. I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation. I find that this algorithm serves as a micro-foundation of the deterministic learning equations by showing that its learning trajectories approach the ones of the deterministic learning equations under large batch sizes. Ultimately, this framework of embedding a dynamical systems description into different abstraction levels gives guidance on how to unleash the full potential of the dynamical systems approach to multi-agent learning.
Collapse
Affiliation(s)
- Wolfram Barfuss
- School of Mathematics, University of Leeds, Leeds, UK.,Tübingen AI Center, University of Tübingen, Tübingen, Germany
| |
Collapse
|
12
|
Yazidi A, Pinto-Orellana MA, Hammer H, Mirtaheri P, Herrera-Viedma E. Solving Sensor Identification Problem Without Knowledge of the Ground Truth Using Replicator Dynamics. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:16-24. [PMID: 31905160 DOI: 10.1109/tcyb.2019.2958627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we consider an emergent problem in the sensor fusion area in which unreliable sensors need to be identified in the absence of the ground truth. We devise a novel solution to the problem using the theory of replicator dynamics that require mild conditions compared to the available state-of-the-art approaches. The solution has a low computational complexity that is linear in terms of the number of involved sensors. We provide some sound theoretical results that catalog the convergence of our approach to a solution where we can clearly unveil the sensor type. Furthermore, we present some experimental results that demonstrate the convergence of our approach in concordance with our theoretical findings.
Collapse
|
13
|
Mann RP. Evolution of heterogeneous perceptual limits and indifference in competitive foraging. PLoS Comput Biol 2021; 17:e1008734. [PMID: 33621223 PMCID: PMC7901736 DOI: 10.1371/journal.pcbi.1008734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 01/21/2021] [Indexed: 11/19/2022] Open
Abstract
The collective behaviour of animal and human groups emerges from the individual decisions and actions of their constituent members. Recent research has revealed many ways in which the behaviour of groups can be influenced by differences amongst their constituent individuals. The existence of individual differences that have implications for collective behaviour raises important questions. How are these differences generated and maintained? Are individual differences driven by exogenous factors, or are they a response to the social dilemmas these groups face? Here I consider the classic case of patch selection by foraging agents under conditions of social competition. I introduce a multilevel model wherein the perceptual sensitivities of agents evolve in response to their foraging success or failure over repeated patch selections. This model reveals a bifurcation in the population, creating a class of agents with no perceptual sensitivity. These agents exploit the social environment to avoid the costs of accurate perception, relying on other agents to make fitness rewards insensitive to the choice of foraging patch. This provides a individual-based evolutionary basis for models incorporating perceptual limits that have been proposed to explain observed deviations from the Ideal Free Distribution (IFD) in empirical studies, while showing that the common assumption in such models that agents share identical sensory limits is likely false. Further analysis of the model shows how agents develop perceptual strategic niches in response to environmental variability. The emergence of agents insensitive to reward differences also has implications for societal resource allocation problems, including the use of financial and prediction markets as mechanisms for aggregating collective wisdom.
Collapse
Affiliation(s)
- Richard P. Mann
- Department of Statistics, School of Mathematics, University of Leeds, Leeds, United Kingdom
| |
Collapse
|
14
|
Huang F, Cao M, Wang L. Learning enables adaptation in cooperation for multi-player stochastic games. J R Soc Interface 2020; 17:20200639. [PMID: 33202177 DOI: 10.1098/rsif.2020.0639] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Interactions among individuals in natural populations often occur in a dynamically changing environment. Understanding the role of environmental variation in population dynamics has long been a central topic in theoretical ecology and population biology. However, the key question of how individuals, in the middle of challenging social dilemmas (e.g. the 'tragedy of the commons'), modulate their behaviours to adapt to the fluctuation of the environment has not yet been addressed satisfactorily. Using evolutionary game theory, we develop a framework of stochastic games that incorporates the adaptive mechanism of reinforcement learning to investigate whether cooperative behaviours can evolve in the ever-changing group interaction environment. When the action choices of players are just slightly influenced by past reinforcements, we construct an analytical condition to determine whether cooperation can be favoured over defection. Intuitively, this condition reveals why and how the environment can mediate cooperative dilemmas. Under our model architecture, we also compare this learning mechanism with two non-learning decision rules, and we find that learning significantly improves the propensity for cooperation in weak social dilemmas, and, in sharp contrast, hinders cooperation in strong social dilemmas. Our results suggest that in complex social-ecological dilemmas, learning enables the adaptation of individuals to varying environments.
Collapse
Affiliation(s)
- Feng Huang
- Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, People's Republic of China.,Center for Data Science and System Complexity, Faculty of Science and Engineering, University of Groningen, Groningen 9747 AG, The Netherlands
| | - Ming Cao
- Center for Data Science and System Complexity, Faculty of Science and Engineering, University of Groningen, Groningen 9747 AG, The Netherlands
| | - Long Wang
- Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
15
|
Mittal S, Mukhopadhyay A, Chakraborty S. Evolutionary dynamics of the delayed replicator-mutator equation: Limit cycle and cooperation. Phys Rev E 2020; 101:042410. [PMID: 32422824 DOI: 10.1103/physreve.101.042410] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 04/06/2020] [Indexed: 11/07/2022]
Abstract
Game theory deals with strategic interactions among players and evolutionary game dynamics tracks the fate of the players' populations under selection. In this paper, we consider the replicator equation for two-player-two-strategy games involving cooperators and defectors. We modify the equation to include the effect of mutation and also delay that corresponds either to the delayed information about the population state or in realizing the effect of interaction among players. By focusing on the four exhaustive classes of symmetrical games-the Stag Hunt game, the Snowdrift game, the Prisoners' Dilemma game, and the Harmony game-we analytically and numerically analyze the delayed replicator-mutator equation to find the explicit condition for the Hopf bifurcation bringing forth stable limit cycle. The existence of the asymptotically stable limit cycle imply the coexistence of the cooperators and the defectors; and in the games, where defection is a stable Nash strategy, a stable limit cycle does provide a mechanism for evolution of cooperation. We find that while mutation alone can never lead to oscillatory cooperation state in two-player-two-strategy games, the delay can change the scenario. On the other hand, there are situations when delay alone cannot lead to the Hopf bifurcation in the absence of mutation in the selection dynamics.
Collapse
Affiliation(s)
- Sourabh Mittal
- Department of Physics, Indian Institute of Technology Kanpur, Uttar Pradesh 208016, India
| | - Archan Mukhopadhyay
- Department of Physics, Indian Institute of Technology Kanpur, Uttar Pradesh 208016, India
| | - Sagar Chakraborty
- Department of Physics, Indian Institute of Technology Kanpur, Uttar Pradesh 208016, India
| |
Collapse
|
16
|
Caring for the future can turn tragedy into comedy for long-term collective action under risk of collapse. Proc Natl Acad Sci U S A 2020; 117:12915-12922. [PMID: 32434908 DOI: 10.1073/pnas.1916545117] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We will need collective action to avoid catastrophic climate change, and this will require valuing the long term as well as the short term. Shortsightedness and uncertainty have hindered progress in resolving this collective action problem and have been recognized as important barriers to cooperation among humans. Here, we propose a coupled social-ecological dilemma to investigate the interdependence of three well-identified components of this cooperation problem: 1) timescales of collapse and recovery in relation to time preferences regarding future outcomes, 2) the magnitude of the impact of collapse, and 3) the number of actors in the collective. We find that, under a sufficiently severe and time-distant collapse, how much the actors care for the future can transform the game from a tragedy of the commons into one of coordination, and even into a comedy of the commons in which cooperation dominates. Conversely, we also find conditions under which even strong concern for the future still does not transform the problem from tragedy to comedy. For a large number of participating actors, we find that the critical collapse impact, at which these game regime changes happen, converges to a fixed value of collapse impact per actor that is independent of the enhancement factor of the public good, which is usually regarded as the driver of the dilemma. Our results not only call for experimental testing but also help explain why polarization in beliefs about human-caused climate change can threaten global cooperation agreements.
Collapse
|
17
|
Strnad FM, Barfuss W, Donges JF, Heitzig J. Deep reinforcement learning in World-Earth system models to discover sustainable management strategies. CHAOS (WOODBURY, N.Y.) 2019; 29:123122. [PMID: 31893656 DOI: 10.1063/1.5124673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 11/20/2019] [Indexed: 06/10/2023]
Abstract
Increasingly complex nonlinear World-Earth system models are used for describing the dynamics of the biophysical Earth system and the socioeconomic and sociocultural World of human societies and their interactions. Identifying pathways toward a sustainable future in these models for informing policymakers and the wider public, e.g., pathways leading to robust mitigation of dangerous anthropogenic climate change, is a challenging and widely investigated task in the field of climate research and broader Earth system science. This problem is particularly difficult when constraints on avoiding transgressions of planetary boundaries and social foundations need to be taken into account. In this work, we propose to combine recently developed machine learning techniques, namely, deep reinforcement learning (DRL), with classical analysis of trajectories in the World-Earth system. Based on the concept of the agent-environment interface, we develop an agent that is generally able to act and learn in variable manageable environment models of the Earth system. We demonstrate the potential of our framework by applying DRL algorithms to two stylized World-Earth system models. Conceptually, we explore thereby the feasibility of finding novel global governance policies leading into a safe and just operating space constrained by certain planetary and socioeconomic boundaries. The artificially intelligent agent learns that the timing of a specific mix of taxing carbon emissions and subsidies on renewables is of crucial relevance for finding World-Earth system trajectories that are sustainable in the long term.
Collapse
Affiliation(s)
- Felix M Strnad
- FutureLab on Game Theory and Networks of Interacting Agents, Research Department 4: Complexity Science, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| | - Wolfram Barfuss
- FutureLab on Earth Resilience in the Anthropocene, Research Department 1: Earth System Analysis, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| | - Jonathan F Donges
- FutureLab on Earth Resilience in the Anthropocene, Research Department 1: Earth System Analysis, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| | - Jobst Heitzig
- FutureLab on Game Theory and Networks of Interacting Agents, Research Department 4: Complexity Science, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| |
Collapse
|