1
|
Turner G, Ferguson AM, Katiyar T, Palminteri S, Orben A. Old Strategies, New Environments: Reinforcement Learning on Social Media. Biol Psychiatry 2025; 97:989-1001. [PMID: 39725300 DOI: 10.1016/j.biopsych.2024.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 12/05/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024]
Abstract
The rise of social media has profoundly altered the social world, introducing new behaviors that can satisfy our social needs. However, it is not yet known whether human social strategies, which are well adapted to the offline world we developed in, operate as effectively within this new social environment. Here, we describe how the computational framework of reinforcement learning (RL) can help us to precisely frame this problem and diagnose where behavior-environment mismatches emerge. The RL framework describes a process by which an agent can learn to maximize their long-term reward. RL, which has proven to be successful in characterizing human social behavior, consists of 3 stages: updating expected reward, valuating expected reward by integrating subjective costs such as effort, and selecting an action. Specific social media affordances, such as the quantifiability of social feedback, may interact with the RL process at each of these stages. In some cases, affordances can exploit RL biases that are beneficial offline by violating the environmental conditions under which such biases are optimal, such as when algorithmic personalization of content interacts with confirmation bias. Characterizing the impact of specific aspects of social media through this lens can improve our understanding of how digital environments shape human behavior. Ultimately, this formal framework could help address pressing open questions about social media use, including its changing role across human development and its impact on outcomes such as mental health.
Collapse
Affiliation(s)
- Georgia Turner
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom.
| | - Amanda M Ferguson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Tanay Katiyar
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom; Département d'Études Cognitives, École Normale Supérieure, Paris, France
| | - Stefano Palminteri
- Département d'Études Cognitives, École Normale Supérieure, Paris, France; Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM, Paris, France
| | - Amy Orben
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
2
|
Ferguson TD, Fyshe A, White A. Electrophysiological signatures of the effect of context on exploration: Greater attentional and learning signals when exploration is costly. Brain Res 2025; 1851:149471. [PMID: 39863243 DOI: 10.1016/j.brainres.2025.149471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/21/2024] [Accepted: 01/19/2025] [Indexed: 01/27/2025]
Abstract
Humans are excellent at modifying our behaviour depending on context. For example, we will change how we explore when losses are possible compared to when losses are not possible. However, it remains unclear what specific cognitive and neural processes are modulated when exploring in different contexts. Here, we had participants learn within two different contexts: in one the participants could lose points while in the other the participants could not. Our goal was to determine how the inclusion of losses impacted human exploratory behaviour (experiment one), and whether we could explain the neural basis of these effects using EEG (experiment two). In experiment one, we found that participants preferred less-variable choices and explored less often when losses were possible. In addition, computational modelling revealed that participants engaged in less random exploration, had a lower rate of learning, and showed lower choice stickiness when losses were possible. In experiment two, we replicated these effects while examining a series of neural signals involved in exploration. During exploration, signals tied to working memory and learning (P3b), attention orienting (P3a) and motivation (late positive potential; an exploratory analysis) were enhanced when losses were possible. These neural differences contribute to why exploratory behaviour is changed by different learning contexts and can be explained by the theoretical claim that losses recruit attention and lead to increased task focus. These results provide insight into the cognitive processes that underlie exploration, and how exploratory behaviour changes across contexts.
Collapse
Affiliation(s)
- Thomas D Ferguson
- Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada.
| | - Alona Fyshe
- Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada; Department of Psychology, University of Alberta Edmonton Alberta Canada; Canada Institute for Advanced Research (CIFAR) AI Chair, Canada
| | - Adam White
- Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada; Canada Institute for Advanced Research (CIFAR) AI Chair, Canada
| |
Collapse
|
3
|
Bruckner R, Heekeren HR, Nassar MR. Understanding learning through uncertainty and bias. COMMUNICATIONS PSYCHOLOGY 2025; 3:24. [PMID: 39948273 PMCID: PMC11825852 DOI: 10.1038/s44271-025-00203-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 01/28/2025] [Indexed: 02/16/2025]
Abstract
Learning allows humans and other animals to make predictions about the environment that facilitate adaptive behavior. Casting learning as predictive inference can shed light on normative cognitive mechanisms that improve predictions under uncertainty. Drawing on normative learning models, we illustrate how learning should be adjusted to different sources of uncertainty, including perceptual uncertainty, risk, and uncertainty due to environmental changes. Such models explain many hallmarks of human learning in terms of specific statistical considerations that come into play when updating predictions under uncertainty. However, humans also display systematic learning biases that deviate from normative models, as studied in computational psychiatry. Some biases can be explained as normative inference conditioned on inaccurate prior assumptions about the environment, while others reflect approximations to Bayesian inference aimed at reducing cognitive demands. These biases offer insights into cognitive mechanisms underlying learning and how they might go awry in psychiatric illness.
Collapse
Affiliation(s)
- Rasmus Bruckner
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany.
- Institute of Psychology, University of Hamburg, Hamburg, Germany.
| | - Hauke R Heekeren
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Executive University Board, University of Hamburg, Hamburg, Germany
| | - Matthew R Nassar
- Robert J. & Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, USA
- Department of Neuroscience, Brown University, Providence, RI, USA
| |
Collapse
|
4
|
Bergerot C, Barfuss W, Romanczuk P. Moderate confirmation bias enhances decision-making in groups of reinforcement-learning agents. PLoS Comput Biol 2024; 20:e1012404. [PMID: 39231162 PMCID: PMC11404843 DOI: 10.1371/journal.pcbi.1012404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 09/16/2024] [Accepted: 08/09/2024] [Indexed: 09/06/2024] Open
Abstract
Humans tend to give more weight to information confirming their beliefs than to information that disconfirms them. Nevertheless, this apparent irrationality has been shown to improve individual decision-making under uncertainty. However, little is known about this bias' impact on decision-making in a social context. Here, we investigate the conditions under which confirmation bias is beneficial or detrimental to decision-making under social influence. To do so, we develop a Collective Asymmetric Reinforcement Learning (CARL) model in which artificial agents observe others' actions and rewards, and update this information asymmetrically. We use agent-based simulations to study how confirmation bias affects collective performance on a two-armed bandit task, and how resource scarcity, group size and bias strength modulate this effect. We find that a confirmation bias benefits group learning across a wide range of resource-scarcity conditions. Moreover, we discover that, past a critical bias strength, resource abundance favors the emergence of two different performance regimes, one of which is suboptimal. In addition, we find that this regime bifurcation comes with polarization in small groups of agents. Overall, our results suggest the existence of an optimal, moderate level of confirmation bias for decision-making in a social context.
Collapse
Affiliation(s)
- Clémence Bergerot
- Department of Biology, Humboldt Universität zu Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany
| | - Wolfram Barfuss
- Transdisciplinary Research Area: Sustainable Futures, University of Bonn, Bonn, Germany
- Center for Development Research (ZEF), University of Bonn, Bonn, Germany
| | - Pawel Romanczuk
- Department of Biology, Humboldt Universität zu Berlin, Berlin, Germany
- Science of Intelligence, Research Cluster of Excellence, Berlin, Germany
| |
Collapse
|
5
|
Homma S, Takezawa M. Risk preference as an outcome of evolutionarily adaptive learning mechanisms: An evolutionary simulation under diverse risky environments. PLoS One 2024; 19:e0307991. [PMID: 39088544 PMCID: PMC11293680 DOI: 10.1371/journal.pone.0307991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 07/15/2024] [Indexed: 08/03/2024] Open
Abstract
The optimization of cognitive and learning mechanisms can reveal complicated behavioral phenomena. In this study, we focused on reinforcement learning, which uses different learning rules for positive and negative reward prediction errors. We attempted to relate the evolved learning bias to the complex features of risk preference such as domain-specific behavior manifests and the relatively stable domain-general factor underlying behaviors. The simulations of the evolution of the two learning rates under diverse risky environments showed that the positive learning rate evolved on average to be higher than the negative one, when agents experienced both tasks where risk aversion was more rewarding and risk seeking was more rewarding. This evolution enabled agents to flexibly choose more reward behaviors depending on the task type. The evolved agents also demonstrated behavioral patterns described by the prospect theory. Our simulations captured two aspects of the evolution of risk preference: the domain-specific aspect, behavior acquired through learning in a specific context; and the implicit domain-general aspect, corresponding to the learning rates shaped through evolution to adaptively behave in a wide range of environments. These results imply that our framework of learning under the innate constraint may be useful in understanding the complicated behavioral phenomena.
Collapse
Affiliation(s)
- Shogo Homma
- Department of Behavioral Science, Graduate School of Humanities and Human Sciences, Hokkaido University, Sapporo, Hokkaido, Japan
- Japan Society for the Promotion of Science, Tokyo, Japan
- Department of Cognitive and Psychological Sciences, Graduate School of Informatics, Nagoya University, Nagoya, Aichi, Japan
| | - Masanori Takezawa
- Department of Behavioral Science, Graduate School of Humanities and Human Sciences, Hokkaido University, Sapporo, Hokkaido, Japan
- Center for Experimental Research in Social Sciences, Hokkaido University, Sapporo, Hokkaido, Japan
- Center for Human Nature, Artificial Intelligence and Neuroscience, Hokkaido University, Sapporo, Hokkaido, Japan
| |
Collapse
|
6
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
7
|
Lefebvre G, Deroy O, Bahrami B. The roots of polarization in the individual reward system. Proc Biol Sci 2024; 291:20232011. [PMID: 38412967 PMCID: PMC10898967 DOI: 10.1098/rspb.2023.2011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open
Abstract
Polarization raises concerns for democracy and society, which have expanded in the internet era where (mis)information has become ubiquitous, its transmission faster than ever, and the freedom and means of opinion expressions are expanding. The origin of polarization however remains unclear, with multiple social and emotional factors and individual reasoning biases likely to explain its current forms. In the present work, we adopt a principled approach and show that polarization tendencies can take root in biased reward processing of new information in favour of choice confirmatory evidence. Through agent-based simulations, we show that confirmation bias in individual learning is an independent mechanism and could be sufficient for creating polarization at group level independently of any additional assumptions about the opinions themselves, a priori beliefs about them, information transmission mechanisms or the structure of social relationship between individuals. This generative process can interact with polarization mechanisms described elsewhere, but constitutes an entrenched biological tendency that helps explain the extraordinary resilience of polarization against mitigating efforts such as dramatic informational change in the environment.
Collapse
Affiliation(s)
- Germain Lefebvre
- Crowd Cognition Group, Ludwig Maximilian Unversität, Gabelsbergerstr 62, Munich 80333, Bavaria, Germany
| | - Ophélia Deroy
- Philosophy, LMU, Geschwister Scholl Platz 1, Munich 80539, Bavaria, Germany
| | - Bahador Bahrami
- Crowd Cognition Group, Ludwig Maximilian Unversität, Gabelsbergerstr 62, Munich 80333, Bavaria, Germany
| |
Collapse
|
8
|
Pupillo F, Bruckner R. Signed and unsigned effects of prediction error on memory: Is it a matter of choice? Neurosci Biobehav Rev 2023; 153:105371. [PMID: 37633626 DOI: 10.1016/j.neubiorev.2023.105371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
Adaptive decision-making is governed by at least two types of memory processes. On the one hand, learned predictions through integrating multiple experiences, and on the other hand, one-shot episodic memories. These two processes interact, and predictions - particularly prediction errors - influence how episodic memories are encoded. However, studies using computational models disagree on the exact shape of this relationship, with some findings showing an effect of signed prediction errors and others showing an effect of unsigned prediction errors on episodic memory. We argue that the choice-confirmation bias, which reflects stronger learning from choice-confirming compared to disconfirming outcomes, could explain these seemingly diverging results. Our perspective implies that the influence of prediction errors on episodic encoding critically depends on whether people can freely choose between options (i.e., instrumental learning tasks) or not (Pavlovian learning tasks). The choice-confirmation bias on memory encoding might have evolved to prioritize memory representations that optimize reward-guided decision-making. We conclude by discussing open issues and implications for future studies.
Collapse
Affiliation(s)
- Francesco Pupillo
- Department of Psychology, Goethe-Universität Frankfurt, Germany; Tilburg School of Social and Behavioral Sciences, Tilburg University, Netherlands.
| | - Rasmus Bruckner
- Department of Education and Psychology, Freie Universität Berlin, Germany; Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
9
|
Towner E, Chierchia G, Blakemore SJ. Sensitivity and specificity in affective and social learning in adolescence. Trends Cogn Sci 2023:S1364-6613(23)00092-X. [PMID: 37198089 DOI: 10.1016/j.tics.2023.04.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 03/23/2023] [Accepted: 04/05/2023] [Indexed: 05/19/2023]
Abstract
Adolescence is a period of heightened affective and social sensitivity. In this review we address how this increased sensitivity influences associative learning. Based on recent evidence from human and rodent studies, as well as advances in computational biology, we suggest that, compared to other age groups, adolescents show features of heightened Pavlovian learning but tend to perform worse than adults at instrumental learning. Because Pavlovian learning does not involve decision-making, whereas instrumental learning does, we propose that these developmental differences might be due to heightened sensitivity to rewards and threats in adolescence, coupled with a lower specificity of responding. We discuss the implications of these findings for adolescent mental health and education.
Collapse
Affiliation(s)
- Emily Towner
- Department of Psychology, University of Cambridge, Downing Street, Cambridge, UK.
| | - Gabriele Chierchia
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy; Department of Psychology, University of Cambridge, Downing Street, Cambridge, UK
| | | |
Collapse
|
10
|
Chierchia G, Soukupová M, Kilford EJ, Griffin C, Leung J, Palminteri S, Blakemore SJ. Confirmatory reinforcement learning changes with age during adolescence. Dev Sci 2023; 26:e13330. [PMID: 36194156 PMCID: PMC7615280 DOI: 10.1111/desc.13330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 07/26/2022] [Accepted: 09/20/2022] [Indexed: 11/26/2022]
Abstract
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11-32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages. RESEARCH HIGHLIGHTS: Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments. People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence. Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices. Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.
Collapse
Affiliation(s)
- Gabriele Chierchia
- Department of Psychology, University of Cambridge, UK
- Institute of Cognitive Neuroscience, University College London, UK
| | | | - Emma J. Kilford
- Institute of Cognitive Neuroscience, University College London, UK
- Department of Clinical, Educational and Health Psychology, University College London, UK
| | - Cait Griffin
- Institute of Cognitive Neuroscience, University College London, UK
| | - Jovita Leung
- Institute of Cognitive Neuroscience, University College London, UK
| | - Stefano Palminteri
- Institute of Cognitive Neuroscience, University College London, UK
- Department of Cognitive Science, École Normale Supérieure, FR
- Institute of Cognitive Neuroscience, HSE, Moscow, Federation of Russia
| | - Sarah-Jayne Blakemore
- Department of Psychology, University of Cambridge, UK
- Institute of Cognitive Neuroscience, University College London, UK
| |
Collapse
|
11
|
Drevet J, Drugowitsch J, Wyart V. Efficient stabilization of imprecise statistical inference through conditional belief updating. Nat Hum Behav 2022; 6:1691-1704. [PMID: 36138224 PMCID: PMC7617215 DOI: 10.1038/s41562-022-01445-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/11/2022] [Indexed: 01/14/2023]
Abstract
Statistical inference is the optimal process for forming and maintaining accurate beliefs about uncertain environments. However, human inference comes with costs due to its associated biases and limited precision. Indeed, biased or imprecise inference can trigger variable beliefs and unwarranted changes in behaviour. Here, by studying decisions in a sequential categorization task based on noisy visual stimuli, we obtained converging evidence that humans reduce the variability of their beliefs by updating them only when the reliability of incoming sensory information is judged as sufficiently strong. Instead of integrating the evidence provided by all stimuli, participants actively discarded as much as a third of stimuli. This conditional belief updating strategy shows good test-retest reliability, correlates with perceptual confidence and explains human behaviour better than previously described strategies. This seemingly suboptimal strategy not only reduces the costs of imprecise computations but also, counterintuitively, increases the accuracy of resulting decisions.
Collapse
Affiliation(s)
- Julie Drevet
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France.
- Département d'Études Cognitives, École Normale Supérieure, Université PSL, Paris, France.
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | - Valentin Wyart
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale (Inserm), Paris, France.
- Département d'Études Cognitives, École Normale Supérieure, Université PSL, Paris, France.
| |
Collapse
|
12
|
Nussenbaum K, Velez JA, Washington BT, Hamling HE, Hartley CA. Flexibility in valenced reinforcement learning computations across development. Child Dev 2022; 93:1601-1615. [PMID: 35596654 PMCID: PMC9831067 DOI: 10.1111/cdev.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Optimal integration of positive and negative outcomes during learning varies depending on an environment's reward statistics. The present study investigated the extent to which children, adolescents, and adults (N = 142 8-25 year-olds, 55% female, 42% White, 31% Asian, 17% mixed race, and 8% Black; data collected in 2021) adapt their weighting of better-than-expected and worse-than-expected outcomes when learning from reinforcement. Participants made choices across two contexts: one in which weighting positive outcomes more heavily than negative outcomes led to better performance, and one in which the reverse was true. Reinforcement learning modeling revealed that across age, participants shifted their valence biases in accordance with environmental structure. Exploratory analyses revealed strengthening of context-dependent flexibility with increasing age.
Collapse
|
13
|
Doricchi F, Lasaponara S, Pazzaglia M, Silvetti M. Left and right temporal-parietal junctions (TPJs) as "match/mismatch" hedonic machines: A unifying account of TPJ function. Phys Life Rev 2022; 42:56-92. [PMID: 35901654 DOI: 10.1016/j.plrev.2022.07.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 07/06/2022] [Indexed: 11/17/2022]
Abstract
Experimental and theoretical studies have tried to gain insights into the involvement of the Temporal Parietal Junction (TPJ) in a broad range of cognitive functions like memory, attention, language, self-agency and theory of mind. Recent investigations have demonstrated the partition of the TPJ in discrete subsectors. Nonetheless, whether these subsectors play different roles or implement an overarching function remains debated. Here, based on a review of available evidence, we propose that the left TPJ codes both matches and mismatches between expected and actual sensory, motor, or cognitive events while the right TPJ codes mismatches. These operations help keeping track of statistical contingencies in personal, environmental, and conceptual space. We show that this hypothesis can account for the participation of the TPJ in disparate cognitive functions, including "humour", and explain: a) the higher incidence of spatial neglect in right brain damage; b) the different emotional reactions that follow left and right brain damage; c) the hemispheric lateralisation of optimistic bias mechanisms; d) the lateralisation of mechanisms that regulate routine and novelty behaviours. We propose that match and mismatch operations are aimed at approximating "free energy", in terms of the free energy principle of decision-making. By approximating "free energy", the match/mismatch TPJ system supports both information seeking to update one's own beliefs and the pleasure of being right in one's own' current choices. This renewed view of the TPJ has relevant clinical implications because the misfunctioning of TPJ-related "match" and "mismatch" circuits in unilateral brain damage can produce low-dimensional deficits of active-inference and predictive coding that can be associated with different neuropsychological disorders.
Collapse
Affiliation(s)
- Fabrizio Doricchi
- Dipartimento di Psicologia 39, Università degli Studi di Roma 'La Sapienza', Roma, Italy; Fondazione Santa Lucia IRCCS, Roma, Italy.
| | - Stefano Lasaponara
- Dipartimento di Psicologia 39, Università degli Studi di Roma 'La Sapienza', Roma, Italy; Fondazione Santa Lucia IRCCS, Roma, Italy
| | - Mariella Pazzaglia
- Dipartimento di Psicologia 39, Università degli Studi di Roma 'La Sapienza', Roma, Italy; Fondazione Santa Lucia IRCCS, Roma, Italy
| | - Massimo Silvetti
- Computational and Translational Neuroscience Lab (CTNLab), Institute of Cognitive Sciences and Technologies, National Research Council (CNR), Rome, Italy
| |
Collapse
|
14
|
Glitz L, Juechems K, Summerfield C, Garrett N. Model Sharing in the Human Medial Temporal Lobe. J Neurosci 2022; 42:5410-5426. [PMID: 35606146 PMCID: PMC7613027 DOI: 10.1523/jneurosci.1978-21.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 04/20/2022] [Accepted: 04/23/2022] [Indexed: 11/21/2022] Open
Abstract
Effective planning involves knowing where different actions take us. However, natural environments are rich and complex, leading to an exponential increase in memory demand as a plan grows in depth. One potential solution is to filter out features of the environment irrelevant to the task at hand. This enables a shared model of transition dynamics to be used for planning over a range of different input features. Here, we asked human participants (13 male, 16 female) to perform a sequential decision-making task, designed so that knowledge should be integrated independently of the input features (visual cues) present in one case but not in another. Participants efficiently switched between using a low-dimensional (cue independent) and a high-dimensional (cue specific) representation of state transitions. fMRI data identified the medial temporal lobe as a locus for learning state transitions. Within this region, multivariate patterns of BOLD responses were less correlated between trials with differing input features but similar state associations in the high dimensional than in the low dimensional case, suggesting that these patterns switched between separable (specific to input features) and shared (invariant to input features) transition models. Finally, we show that transition models are updated more strongly following the receipt of positive compared with negative outcomes, a finding that challenges conventional theories of planning. Together, these findings propose a computational and neural account of how information relevant for planning can be shared and segmented in response to the vast array of contextual features we encounter in our world.SIGNIFICANCE STATEMENT Effective planning involves maintaining an accurate model of which actions take us to which locations. But in a world awash with information, mapping actions to states with the right level of complexity is critical. Using a new decision-making "heist task" in conjunction with computational modeling and fMRI, we show that patterns of BOLD responses in the medial temporal lobe-a brain region key for prospective planning-become less sensitive to the presence of visual features when these are irrelevant to the task at hand. By flexibly adapting the complexity of task-state representations in this way, state-action mappings learned under one set of features can be used to plan in the presence of others.
Collapse
Affiliation(s)
- Leonie Glitz
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6HG, United Kingdom
| | - Keno Juechems
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6HG, United Kingdom
| | | | - Neil Garrett
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6HG, United Kingdom
- School of Psychology, University of East Anglia, Norwich NR4 7TJ, United Kingdom
| |
Collapse
|
15
|
Palminteri S, Lebreton M. The computational roots of positivity and confirmation biases in reinforcement learning. Trends Cogn Sci 2022; 26:607-621. [PMID: 35662490 DOI: 10.1016/j.tics.2022.04.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 12/16/2022]
Abstract
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one's own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to 'high-level' belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.
Collapse
Affiliation(s)
- Stefano Palminteri
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France; Département d'Études Cognitives, Ecole Normale Supérieure, Paris, France; Université de Recherche Paris Sciences et Lettres, Paris, France.
| | - Maël Lebreton
- Paris School of Economics, Paris, France; LabNIC, Department of Fundamental Neurosciences, University of Geneva, Geneva, Switzerland; Swiss Center for Affective Science, Geneva, Switzerland.
| |
Collapse
|
16
|
Barakchian Z, Vahabie AH, Nili Ahmadabadi M. Implicit Counterfactual Effect in Partial Feedback Reinforcement Learning: Behavioral and Modeling Approach. Front Neurosci 2022; 16:631347. [PMID: 35620668 PMCID: PMC9127865 DOI: 10.3389/fnins.2022.631347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
Context remarkably affects learning behavior by adjusting option values according to the distribution of available options. Displaying counterfactual outcomes, the outcomes of the unchosen option alongside the chosen one (i.e., providing complete feedback), would increase the contextual effect by inducing participants to compare the two outcomes during learning. However, when the context only consists of the juxtaposition of several options and there is no such explicit counterfactual factor (i.e., only partial feedback is provided), it is not clear whether and how the contextual effect emerges. In this research, we employ Partial and Complete feedback paradigms in which options are associated with different reward distributions. Our modeling analysis shows that the model that uses the outcome of the chosen option for updating the values of both chosen and unchosen options in opposing directions can better account for the behavioral data. This is also in line with the diffusive effect of dopamine on the striatum. Furthermore, our data show that the contextual effect is not limited to probabilistic rewards, but also extends to magnitude rewards. These results suggest that by extending the counterfactual concept to include the effect of the chosen outcome on the unchosen option, we can better explain why there is a contextual effect in situations in which there is no extra information about the unchosen outcome.
Collapse
Affiliation(s)
- Zahra Barakchian
- Department of Cognitive Neuroscience, Institute for Research in Fundamental Sciences, Tehran, Iran
- *Correspondence: Zahra Barakchian
| | - Abdol-Hossein Vahabie
- Cognitive Systems Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
- Department of Psychology, Faculty of Psychology and Education, University of Tehran, Tehran, Iran
| | - Majid Nili Ahmadabadi
- Cognitive Systems Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
| |
Collapse
|
17
|
Ciranka S, Linde-Domingo J, Padezhki I, Wicharz C, Wu CM, Spitzer B. Asymmetric reinforcement learning facilitates human inference of transitive relations. Nat Hum Behav 2022; 6:555-564. [PMID: 35102348 PMCID: PMC9038534 DOI: 10.1038/s41562-021-01263-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 11/25/2021] [Indexed: 12/16/2022]
Abstract
Humans and other animals are capable of inferring never-experienced relations (for example, A > C) from other relational observations (for example, A > B and B > C). The processes behind such transitive inference are subject to intense research. Here we demonstrate a new aspect of relational learning, building on previous evidence that transitive inference can be accomplished through simple reinforcement learning mechanisms. We show in simulations that inference of novel relations benefits from an asymmetric learning policy, where observers update only their belief about the winner (or loser) in a pair. Across four experiments (n = 145), we find substantial empirical support for such asymmetries in inferential learning. The learning policy favoured by our simulations and experiments gives rise to a compression of values that is routinely observed in psychophysics and behavioural economics. In other words, a seemingly biased learning strategy that yields well-known cognitive distortions can be beneficial for transitive inferential judgements. Ciranka, Linde-Domingo et al. show that inference of transitive orderings from pairwise relations benefits from a seemingly biased learning strategy, where observers update their belief about one of the pair members but not the other.
Collapse
Affiliation(s)
- Simon Ciranka
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany.,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany
| | - Juan Linde-Domingo
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Ivan Padezhki
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Clara Wicharz
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Charley M Wu
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany.,Human and Machine Cognition Lab, University of Tübingen, Tübingen, Germany
| | - Bernhard Spitzer
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany. .,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany.
| |
Collapse
|