1
|
Knight V, Harper M, Glynatsi NE, Gillard J. Recognising and evaluating the effectiveness of extortion in the Iterated Prisoner's Dilemma. PLoS One 2024; 19:e0304641. [PMID: 39058703 PMCID: PMC11280246 DOI: 10.1371/journal.pone.0304641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 05/16/2024] [Indexed: 07/28/2024] Open
Abstract
Establishing and maintaining mutual cooperation in agent-to-agent interactions can be viewed as a question of direct reciprocity and readily applied to the Iterated Prisoner's Dilemma. Agents cooperate, at a small cost to themselves, in the hope of obtaining a future benefit. Zero-determinant strategies, introduced in 2012, have a subclass of strategies that are provably extortionate. In the established literature, most of the studies of the effectiveness or lack thereof, of zero-determinant strategies is done by placing some zero-determinant strategy in a specific scenario (collection of agents) and evaluating its performance either numerically or theoretically. Extortionate strategies are algebraically rigid and memory-one by definition, and requires complete knowledge of a strategy (the memory-one cooperation probabilities). The contribution of this work is a method to detect extortionate behaviour from the history of play of an arbitrary strategy. This inverts the paradigm of most studies: instead of observing the effectiveness of some theoretically extortionate strategies, the largest known collection of strategies will be observed and their intensity of extortion quantified empirically. Moreover, we show that the lack of adaptability of extortionate strategies extends via this broader definition.
Collapse
Affiliation(s)
- Vincent Knight
- School of Mathematics, Cardiff University, Cardiff, United Kingdom
| | - Marc Harper
- Google Inc., Mountain View, CA, United States of America
| | - Nikoleta E. Glynatsi
- Max Planck Research Group on the Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Jonathan Gillard
- School of Mathematics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
2
|
Chen X, Fu F. Outlearning extortioners: unbending strategies can foster reciprocal fairness and cooperation. PNAS NEXUS 2023; 2:pgad176. [PMID: 37287707 PMCID: PMC10244001 DOI: 10.1093/pnasnexus/pgad176] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 05/14/2023] [Accepted: 05/16/2023] [Indexed: 06/09/2023]
Abstract
Recent theory shows that extortioners taking advantage of the zero-determinant (ZD) strategy can unilaterally claim an unfair share of the payoffs in the Iterated Prisoner's Dilemma. It is thus suggested that against a fixed extortioner, any adapting coplayer should be subdued with full cooperation as their best response. In contrast, recent experiments demonstrate that human players often choose not to accede to extortion out of concern for fairness, actually causing extortioners to suffer more loss than themselves. In light of this, here we reveal fair-minded strategies that are unbending to extortion such that any payoff-maximizing extortioner ultimately will concede in their own interest by offering a fair split in head-to-head matches. We find and characterize multiple general classes of such unbending strategies, including generous ZD strategies and Win-Stay, Lose-Shift (WSLS) as particular examples. When against fixed unbending players, extortioners are forced with consequentially increasing losses whenever intending to demand a more unfair share. Our analysis also pivots to the importance of payoff structure in determining the superiority of ZD strategies and in particular their extortion ability. We show that an extortionate ZD player can be even outperformed by, for example, WSLS, if the total payoff of unilateral cooperation is smaller than that of mutual defection. Unbending strategies can be used to outlearn evolutionary extortioners and catalyze the evolution of Tit-for-Tat-like strategies out of ZD players. Our work has implications for promoting fairness and resisting extortion so as to uphold a just and cooperative society.
Collapse
Affiliation(s)
- Xingru Chen
- School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
- Department of Mathematics, Dartmouth College, Hanover, 03755 NH, USA
| | - Feng Fu
- Department of Mathematics, Dartmouth College, Hanover, 03755 NH, USA
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, 03756 NH, USA
| |
Collapse
|
3
|
Schmid L, Hilbe C, Chatterjee K, Nowak MA. Direct reciprocity between individuals that use different strategy spaces. PLoS Comput Biol 2022; 18:e1010149. [PMID: 35700167 PMCID: PMC9197081 DOI: 10.1371/journal.pcbi.1010149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 04/28/2022] [Indexed: 12/04/2022] Open
Abstract
In repeated interactions, players can use strategies that respond to the outcome of previous rounds. Much of the existing literature on direct reciprocity assumes that all competing individuals use the same strategy space. Here, we study both learning and evolutionary dynamics of players that differ in the strategy space they explore. We focus on the infinitely repeated donation game and compare three natural strategy spaces: memory-1 strategies, which consider the last moves of both players, reactive strategies, which respond to the last move of the co-player, and unconditional strategies. These three strategy spaces differ in the memory capacity that is needed. We compute the long term average payoff that is achieved in a pairwise learning process. We find that smaller strategy spaces can dominate larger ones. For weak selection, unconditional players dominate both reactive and memory-1 players. For intermediate selection, reactive players dominate memory-1 players. Only for strong selection and low cost-to-benefit ratio, memory-1 players dominate the others. We observe that the supergame between strategy spaces can be a social dilemma: maximum payoff is achieved if both players explore a larger strategy space, but smaller strategy spaces dominate.
Collapse
Affiliation(s)
| | - Christian Hilbe
- Max Planck Research Group Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | | | - Martin A. Nowak
- Department of Mathematics, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
4
|
Lindig-León C, Schmid G, Braun DA. Nash equilibria in human sensorimotor interactions explained by Q-learning with intrinsic costs. Sci Rep 2021; 11:20779. [PMID: 34675336 PMCID: PMC8531365 DOI: 10.1038/s41598-021-99428-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/01/2021] [Indexed: 11/09/2022] Open
Abstract
The Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models to human behavior engaged in sensorimotor interactions with haptic feedback based on three classic games, including the prisoner's dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.
Collapse
Affiliation(s)
- Cecilia Lindig-León
- Institute of Neural Information Processing, Faculty of Engineering, Computer Science and Psychology, Ulm University, Ulm, Germany.
| | - Gerrit Schmid
- Institute of Neural Information Processing, Faculty of Engineering, Computer Science and Psychology, Ulm University, Ulm, Germany
| | - Daniel A Braun
- Institute of Neural Information Processing, Faculty of Engineering, Computer Science and Psychology, Ulm University, Ulm, Germany
| |
Collapse
|
5
|
Glynatsi NE, Knight VA. Using a theory of mind to find best responses to memory-one strategies. Sci Rep 2020; 10:17287. [PMID: 33057134 PMCID: PMC7560663 DOI: 10.1038/s41598-020-74181-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 09/22/2020] [Indexed: 11/19/2022] Open
Abstract
Memory-one strategies are a set of Iterated Prisoner's Dilemma strategies that have been praised for their mathematical tractability and performance against single opponents. This manuscript investigates best response memory-one strategies with a theory of mind for their opponents. The results add to the literature that has shown that extortionate play is not always optimal by showing that optimal play is often not extortionate. They also provide evidence that memory-one strategies suffer from their limited memory in multi agent interactions and can be out performed by optimised strategies with longer memory. We have developed a theory that has allowed to explore the entire space of memory-one strategies. The framework presented is suitable to study memory-one strategies in the Prisoner's Dilemma, but also in evolutionary processes such as the Moran process. Furthermore, results on the stability of defection in populations of memory-one strategies are also obtained.
Collapse
Affiliation(s)
- Nikoleta E Glynatsi
- School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK.
- Max Planck Institute for Evolutionary Biology, Plön, 24 306, Germany.
| | - Vincent A Knight
- School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK
| |
Collapse
|
6
|
Park YJ, Cho YS, Kim SB. Multi-agent reinforcement learning with approximate model learning for competitive games. PLoS One 2019; 14:e0222215. [PMID: 31509568 PMCID: PMC6739057 DOI: 10.1371/journal.pone.0222215] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 08/23/2019] [Indexed: 11/18/2022] Open
Abstract
We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.
Collapse
Affiliation(s)
- Young Joon Park
- School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
| | - Yoon Sang Cho
- School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
| | - Seoung Bum Kim
- School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
| |
Collapse
|
7
|
Viola TW, Niederauer JPO, Kluwe-Schiavon B, Sanvicente-Vieira B, Grassi-Oliveira R. Cocaine use disorder in females is associated with altered social decision-making: a study with the prisoner's dilemma and the ultimatum game. BMC Psychiatry 2019; 19:211. [PMID: 31277620 PMCID: PMC6612218 DOI: 10.1186/s12888-019-2198-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Accepted: 06/25/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Chronic cocaine use is associated with cognitive deficits, including poor performance on neuropsychological tasks of memory, executive functions, theory of mind and decision-making. However, the relationship between cocaine use disorder and social decision-making remains unclear. This is particularly relevant given the fact that many cocaine abusers present impairments in social functioning. In this sense, game theory paradigms have been helping to comprehend the behavior of psychiatric patients when they directly engage in social situations, which may better approximate many of their real-life choices. METHODS The present study investigated social decision-making in individuals with or without cocaine use disorder, examining their behavior in the Prisoner's Dilemma and in the Ultimatum Game. Thus, 129 females diagnosed with cocaine use disorder and 55 females with no history of substance abuse were recruited and performed both social decision-making tasks. Additional assessments included information about demographics, patterns of substance consumption and executive function performance. RESULTS Females with cocaine use disorder opted more often to not defect in the Prisoner's Dilemma, while in the Ultimatum Game they frequently chose to accept the first and unfair offer as responders. These effects were more pronounced within females with long-term history of cocaine use. Associations between cocaine use disorder and altered social decision-making were independent from demographic and executive function variables. CONCLUSIONS The influence of cocaine use disorder on social decision-making was detected in both game paradigms, resulting in more cooperative behavior in the Prisoner's Dilemma and higher acceptance rate of unfair offers in the Ultimatum Game. Further studies should focus on investigating these associations to shed light on the putative biopsychosocial factors underlying the observed effects.
Collapse
Affiliation(s)
- Thiago Wendt Viola
- Pontifical Catholic University of Rio Grande do Sul (PUCRS), Developmental Cognitive Neuroscience Lab, Avenida Ipiranga 6690 – Prédio 63, Jardim Botânico, Porto Alegre, RS Brazil
| | - João Paulo Otolia Niederauer
- Pontifical Catholic University of Rio Grande do Sul (PUCRS), Developmental Cognitive Neuroscience Lab, Avenida Ipiranga 6690 – Prédio 63, Jardim Botânico, Porto Alegre, RS Brazil
| | - Bruno Kluwe-Schiavon
- 0000 0004 1937 0650grid.7400.3Experimental and Clinical Pharmacopsychology Laboratory Department of Psychiatry Psychotherapy and Psychosomatics Psychiatric Hospital, University of Zurich, Zürich, Switzerland
| | - Breno Sanvicente-Vieira
- Pontifical Catholic University of Rio Grande do Sul (PUCRS), Developmental Cognitive Neuroscience Lab, Avenida Ipiranga 6690 – Prédio 63, Jardim Botânico, Porto Alegre, RS Brazil
| | - Rodrigo Grassi-Oliveira
- Pontifical Catholic University of Rio Grande do Sul (PUCRS), Developmental Cognitive Neuroscience Lab, Avenida Ipiranga 6690 - Prédio 63, Jardim Botânico, Porto Alegre, RS, Brazil.
| |
Collapse
|
8
|
Knight V, Harper M, Glynatsi NE, Campbell O. Evolution reinforces cooperation with the emergence of self-recognition mechanisms: An empirical study of strategies in the Moran process for the iterated prisoner's dilemma. PLoS One 2018; 13:e0204981. [PMID: 30359381 PMCID: PMC6201880 DOI: 10.1371/journal.pone.0204981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 09/18/2018] [Indexed: 11/18/2022] Open
Abstract
We present insights and empirical results from an extensive numerical study of the evolutionary dynamics of the iterated prisoner's dilemma. Fixation probabilities for Moran processes are obtained for all pairs of 164 different strategies including classics such as TitForTat, zero determinant strategies, and many more sophisticated strategies. Players with long memories and sophisticated behaviours outperform many strategies that perform well in a two player setting. Moreover we introduce several strategies trained with evolutionary algorithms to excel at the Moran process. These strategies are excellent invaders and resistors of invasion and in some cases naturally evolve handshaking mechanisms to resist invasion. The best invaders were those trained to maximize total payoff while the best resistors invoke handshake mechanisms. This suggests that while maximizing individual payoff can lead to the evolution of cooperation through invasion, the relatively weak invasion resistance of payoff maximizing strategies are not as evolutionarily stable as strategies employing handshake mechanisms.
Collapse
Affiliation(s)
- Vincent Knight
- Cardiff University, School of Mathematics, Cardiff, United Kingdom
| | - Marc Harper
- Google Inc., Mountain View, CA, United States of America
| | | | | |
Collapse
|
9
|
García J, van Veelen M. No Strategy Can Win in the Repeated Prisoner's Dilemma: Linking Game Theory and Computer Simulations. Front Robot AI 2018; 5:102. [PMID: 33500981 PMCID: PMC7805755 DOI: 10.3389/frobt.2018.00102] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Accepted: 08/06/2018] [Indexed: 11/13/2022] Open
Abstract
Computer simulations are regularly used for studying the evolution of strategies in repeated games. These simulations rarely pay attention to game theoretical results that can illuminate the data analysis or the questions being asked. Results from evolutionary game theory imply that for every Nash equilibrium, there are sequences of mutants that would destabilize them. If strategies are not limited to a finite set, populations move between a variety of Nash equilibria with different levels of cooperation. This instability is inescapable, regardless of how strategies are represented. We present algorithms that show that simulations do agree with the theory. This implies that cognition itself may only have limited impact on the cycling dynamics. We argue that the role of mutations or exploration is more important in determining levels of cooperation.
Collapse
Affiliation(s)
- Julián García
- Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | | |
Collapse
|
10
|
Harper M, Knight V, Jones M, Koutsovoulos G, Glynatsi NE, Campbell O. Reinforcement learning produces dominant strategies for the Iterated Prisoner's Dilemma. PLoS One 2017; 12:e0188046. [PMID: 29228001 PMCID: PMC5724862 DOI: 10.1371/journal.pone.0188046] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 10/27/2017] [Indexed: 12/02/2022] Open
Abstract
We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also.
Collapse
Affiliation(s)
- Marc Harper
- Google Inc., Mountain View, CA, United States of America
| | - Vincent Knight
- Cardiff University, School of Mathematics, Cardiff, United Kingdom
| | | | | | | | | |
Collapse
|