1
|
Danwitz L, von Helversen B. Observational learning of exploration-exploitation strategies in bandit tasks. Cognition 2025; 259:106124. [PMID: 40117983 DOI: 10.1016/j.cognition.2025.106124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 02/07/2025] [Accepted: 03/12/2025] [Indexed: 03/23/2025]
Abstract
In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones-a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.
Collapse
Affiliation(s)
- Ludwig Danwitz
- Department of Psychology, University of Bremen, Germany.
| | | |
Collapse
|
2
|
Turner G, Ferguson AM, Katiyar T, Palminteri S, Orben A. Old Strategies, New Environments: Reinforcement Learning on Social Media. Biol Psychiatry 2025; 97:989-1001. [PMID: 39725300 DOI: 10.1016/j.biopsych.2024.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 12/05/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024]
Abstract
The rise of social media has profoundly altered the social world, introducing new behaviors that can satisfy our social needs. However, it is not yet known whether human social strategies, which are well adapted to the offline world we developed in, operate as effectively within this new social environment. Here, we describe how the computational framework of reinforcement learning (RL) can help us to precisely frame this problem and diagnose where behavior-environment mismatches emerge. The RL framework describes a process by which an agent can learn to maximize their long-term reward. RL, which has proven to be successful in characterizing human social behavior, consists of 3 stages: updating expected reward, valuating expected reward by integrating subjective costs such as effort, and selecting an action. Specific social media affordances, such as the quantifiability of social feedback, may interact with the RL process at each of these stages. In some cases, affordances can exploit RL biases that are beneficial offline by violating the environmental conditions under which such biases are optimal, such as when algorithmic personalization of content interacts with confirmation bias. Characterizing the impact of specific aspects of social media through this lens can improve our understanding of how digital environments shape human behavior. Ultimately, this formal framework could help address pressing open questions about social media use, including its changing role across human development and its impact on outcomes such as mental health.
Collapse
Affiliation(s)
- Georgia Turner
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom.
| | - Amanda M Ferguson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Tanay Katiyar
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom; Département d'Études Cognitives, École Normale Supérieure, Paris, France
| | - Stefano Palminteri
- Département d'Études Cognitives, École Normale Supérieure, Paris, France; Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM, Paris, France
| | - Amy Orben
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
3
|
Ferguson TD, Fyshe A, White A. Electrophysiological signatures of the effect of context on exploration: Greater attentional and learning signals when exploration is costly. Brain Res 2025; 1851:149471. [PMID: 39863243 DOI: 10.1016/j.brainres.2025.149471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/21/2024] [Accepted: 01/19/2025] [Indexed: 01/27/2025]
Abstract
Humans are excellent at modifying our behaviour depending on context. For example, we will change how we explore when losses are possible compared to when losses are not possible. However, it remains unclear what specific cognitive and neural processes are modulated when exploring in different contexts. Here, we had participants learn within two different contexts: in one the participants could lose points while in the other the participants could not. Our goal was to determine how the inclusion of losses impacted human exploratory behaviour (experiment one), and whether we could explain the neural basis of these effects using EEG (experiment two). In experiment one, we found that participants preferred less-variable choices and explored less often when losses were possible. In addition, computational modelling revealed that participants engaged in less random exploration, had a lower rate of learning, and showed lower choice stickiness when losses were possible. In experiment two, we replicated these effects while examining a series of neural signals involved in exploration. During exploration, signals tied to working memory and learning (P3b), attention orienting (P3a) and motivation (late positive potential; an exploratory analysis) were enhanced when losses were possible. These neural differences contribute to why exploratory behaviour is changed by different learning contexts and can be explained by the theoretical claim that losses recruit attention and lead to increased task focus. These results provide insight into the cognitive processes that underlie exploration, and how exploratory behaviour changes across contexts.
Collapse
Affiliation(s)
- Thomas D Ferguson
- Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada.
| | - Alona Fyshe
- Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada; Department of Psychology, University of Alberta Edmonton Alberta Canada; Canada Institute for Advanced Research (CIFAR) AI Chair, Canada
| | - Adam White
- Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada; Canada Institute for Advanced Research (CIFAR) AI Chair, Canada
| |
Collapse
|
4
|
Muzik O, Diwadkar VA. Human regulatory systems in the age of abundance: A predictive processing perspective. Ann N Y Acad Sci 2025; 1545:16-27. [PMID: 40022426 DOI: 10.1111/nyas.15302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2025]
Abstract
Human regulatory systems largely evolved under conditions of food and information scarcity but are now being forced to deal with abundance. The impact of abundance and the inability of human regulatory systems to adapt to it have fed a surge in dual health challenges: (1) a rise in obesity related to food abundance and (2) a rise in stress and anxiety related to information abundance. No single framework has been developed to describe why and how the transition from scarcity to abundance has been so challenging. Here, we provide a speculative model based on predictive processing. We suggest that whereas scarcity (above destructive lower bounds like famine or information voids) preserves the fidelity of the relationship between prediction errors and predictions, abundance distorts this relationship. Furthermore, prediction error minimization is enhanced under scarcity (as the number of competing states in the niche is restricted), whereas the opposite is true under abundance. We also discuss how abundance warps the fundamental drive for seeking novelty by fueling the brain's exploration (as opposed to exploitation) mode. Ameliorative strategies for regulating food and information abundance may largely depend on simulating scarcity, that environmental condition to which human regulatory systems have adapted over millennia.
Collapse
Affiliation(s)
- Otto Muzik
- Department of Pediatrics, Wayne State University School of Medicine, Detroit, Michigan, USA
- Department of Radiology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Vaibhav A Diwadkar
- Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, Michigan, USA
| |
Collapse
|
5
|
Bruckner R, Heekeren HR, Nassar MR. Understanding learning through uncertainty and bias. COMMUNICATIONS PSYCHOLOGY 2025; 3:24. [PMID: 39948273 PMCID: PMC11825852 DOI: 10.1038/s44271-025-00203-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 01/28/2025] [Indexed: 02/16/2025]
Abstract
Learning allows humans and other animals to make predictions about the environment that facilitate adaptive behavior. Casting learning as predictive inference can shed light on normative cognitive mechanisms that improve predictions under uncertainty. Drawing on normative learning models, we illustrate how learning should be adjusted to different sources of uncertainty, including perceptual uncertainty, risk, and uncertainty due to environmental changes. Such models explain many hallmarks of human learning in terms of specific statistical considerations that come into play when updating predictions under uncertainty. However, humans also display systematic learning biases that deviate from normative models, as studied in computational psychiatry. Some biases can be explained as normative inference conditioned on inaccurate prior assumptions about the environment, while others reflect approximations to Bayesian inference aimed at reducing cognitive demands. These biases offer insights into cognitive mechanisms underlying learning and how they might go awry in psychiatric illness.
Collapse
Affiliation(s)
- Rasmus Bruckner
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany.
- Institute of Psychology, University of Hamburg, Hamburg, Germany.
| | - Hauke R Heekeren
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- Executive University Board, University of Hamburg, Hamburg, Germany
| | - Matthew R Nassar
- Robert J. & Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, USA
- Department of Neuroscience, Brown University, Providence, RI, USA
| |
Collapse
|
6
|
Wang MC, Soltani A. Contributions of Attention to Learning in Multidimensional Reward Environments. J Neurosci 2025; 45:e2300232024. [PMID: 39681464 PMCID: PMC11823339 DOI: 10.1523/jneurosci.2300-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 10/09/2024] [Accepted: 11/08/2024] [Indexed: 12/18/2024] Open
Abstract
Real-world choice options have many features or attributes, whereas the reward outcome from those options only depends on a few features or attributes. It has been shown that humans learn and combine feature-based with more complex conjunction-based learning to tackle challenges of learning in naturalistic reward environments. However, it remains unclear how different learning strategies interact to determine what features or conjunctions should be attended to and control choice behavior, and how subsequent attentional modulations influence future learning and choice. To address these questions, we examined the behavior of male and female human participants during a three-dimensional learning task in which reward outcomes for different stimuli could be predicted based on a combination of an informative feature and conjunction. Using multiple approaches, we found that both choice behavior and reward probabilities estimated by participants were most accurately described by attention-modulated models that learned the predictive values of both the informative feature and the informative conjunction. Specifically, in the reinforcement learning model that best fit choice data, attention was controlled by the difference in the integrated feature and conjunction values. The resulting attention weights modulated learning by increasing the learning rate on attended features and conjunctions. Critically, modulating decision-making by attention weights did not improve the fit of data, providing little evidence for direct attentional effects on choice. These results suggest that in multidimensional environments, humans direct their attention not only to selectively process reward-predictive attributes but also to find parsimonious representations of the reward contingencies for more efficient learning.
Collapse
Affiliation(s)
- Michael Chong Wang
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover 03755, New Hampshire
| | - Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover 03755, New Hampshire
| |
Collapse
|
7
|
Ohta H, Nozawa T, Higuchi K, Meredith AL, Morimoto Y, Satoh Y, Ishizuka T. Altered trial-to-trial responses to reward outcomes in KCNMA1 knockout mice during probabilistic learning tasks. BEHAVIORAL AND BRAIN FUNCTIONS : BBF 2024; 20:36. [PMID: 39731174 DOI: 10.1186/s12993-024-00262-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 12/06/2024] [Indexed: 12/29/2024]
Abstract
The large-conductance calcium- and voltage-activated potassium (BK) channels, encoded by the KCNMA1 gene, play important roles in neuronal function. Mutations in KCNMA1 have been found in patients with various neurodevelopmental features, including intellectual disability, autism spectrum disorder (ASD), or attention deficit hyperactivity disorder (ADHD). Previous studies of KCNMA1 knockout mice have suggested altered activity patterns and behavioral flexibility, but it remained unclear whether these changes primarily affect immediate behavioral adaptation or longer-term learning processes. Using a 5-armed bandit task (5-ABT) and a novel Δrepeat rate analysis method that considers individual baseline choice tendencies, we investigated immediate trial-by-trial Win-Stay-Lose-Shift (WSLS) strategies and learning rates across multiple trials in KCNMA1 knockout (KCNMA1-/-) mice. Three key findings emerged: (1) Unlike wildtype mice, which showed increased Δrepeat rates after rewards and decreased rates after losses, KCNMA1-/- mice exhibited impaired WSLS behavior, (2) KCNMA1-/- mice displayed shortened response intervals after unrewarded trials, and (3) despite these short-term behavioral impairments, their learning rates and task accuracy remained comparable to wildtype mice, with significantly shorter task completion times. These results suggest that BK channel dysfunction primarily alters immediate behavioral responses to outcomes in the next trial rather than affecting long-term learning capabilities. These findings and our analytical method may help identify behavioral phenotypes in animal models of both BK channel-related and other neurodevelopmental disorders.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan.
| | - Takashi Nozawa
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Kohki Higuchi
- Tokyo Denki University, Ishizaka, Hiki, Saitama, Hatoyama, 359-0394, Japan
| | - Andrea L Meredith
- Department of Physiology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Yasushi Satoh
- Department of Biochemistry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| |
Collapse
|
8
|
Zhang W, Li Y, Zhou C, Li B, Schwieter JW, Liu H, Liu M. Expectation to rewards modulates learning emotional words: Evidence from a hierarchical Bayesian model. Biol Psychol 2024; 193:108895. [PMID: 39481632 DOI: 10.1016/j.biopsycho.2024.108895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 10/13/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024]
Abstract
In language acquisition, individuals learn the emotional value of words through external feedback. Previous studies have used emotional words as experimental materials to explore the cognitive mechanisms underlying emotional language processing, but have failed to recognize that languages are acquired in changing environments. To this end, this study aims to combine reinforcement learning with emotional word learning, using a probabilistic reversal learning task to explore how individuals acquire the valence of emotional words in a dynamically changing environment. Computational modeling on both behavioral and event-related potential (ERP) data revealed that individuals' expectations to rewards modulated the learning speed and temporal processing of emotional words, demonstrating a clear negative bias. Specifically, as the expected value increased, individuals responded faster and exhibited higher amplitudes for negative emotional words. These findings shed light on the neural mechanisms of emotional word learning in a volatile environment, highlighting the crucial role of expectations in this process and a preference for learning negative information.
Collapse
Affiliation(s)
- Weiwei Zhang
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Dalian, Liaoning Province 116029, China
| | - Yingyu Li
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Dalian, Liaoning Province 116029, China
| | - Chuan Zhou
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Dalian, Liaoning Province 116029, China
| | - Baike Li
- School of Psychology, Liaoning Normal University, Dalian, China
| | - John W Schwieter
- Language Acquisition, Cognition, and Multilingualism Laboratory, Bilingualism Matters, Wilfrid Laurier University, Canada; Department of Linguistics and Languages, McMaster University, Canada
| | - Huanhuan Liu
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian 116029, China; Key Laboratory of Brain and Cognitive Neuroscience, Dalian, Liaoning Province 116029, China.
| | - Meng Liu
- School of Psychology, Liaoning Normal University, Dalian, China.
| |
Collapse
|
9
|
Ohta H, Nozawa T, Nakano T, Morimoto Y, Ishizuka T. Nonlinear age-related differences in probabilistic learning in mice: A 5-armed bandit task study. Neurobiol Aging 2024; 142:8-16. [PMID: 39029360 DOI: 10.1016/j.neurobiolaging.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 07/21/2024]
Abstract
This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan.
| | - Takashi Nozawa
- Mejiro University, 4-31-1 Naka-Ochiai, Shinjuku, Tokyo 161-8539, Japan
| | - Takashi Nakano
- Department of Computational Biology, School of Medicine, Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan; International Center for Brain Science (ICBS), Fujita Health University, 1-98 Dengakugakubo, Kutsukake, Toyoake, Aichi 470-1192, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan
| |
Collapse
|
10
|
Binz M, Dasgupta I, Jagadish A, Botvinick M, Wang JX, Schulz E. Meta-learning: Data, architecture, and both. Behav Brain Sci 2024; 47:e170. [PMID: 39311510 DOI: 10.1017/s0140525x24000311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
We are encouraged by the many positive commentaries on our target article. In this response, we recapitulate some of the points raised and identify synergies between them. We have arranged our response based on the tension between data and architecture that arises in the meta-learning framework. We additionally provide a short discussion that touches upon connections to foundation models.
Collapse
Affiliation(s)
- Marcel Binz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- Helmholtz Institute for Human-Centered AI, Munich, Germany
| | | | - Akshay Jagadish
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- Helmholtz Institute for Human-Centered AI, Munich, Germany
| | | | | | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- Helmholtz Institute for Human-Centered AI, Munich, Germany
| |
Collapse
|
11
|
Bergerot C, Barfuss W, Romanczuk P. Moderate confirmation bias enhances decision-making in groups of reinforcement-learning agents. PLoS Comput Biol 2024; 20:e1012404. [PMID: 39231162 PMCID: PMC11404843 DOI: 10.1371/journal.pcbi.1012404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 09/16/2024] [Accepted: 08/09/2024] [Indexed: 09/06/2024] Open
Abstract
Humans tend to give more weight to information confirming their beliefs than to information that disconfirms them. Nevertheless, this apparent irrationality has been shown to improve individual decision-making under uncertainty. However, little is known about this bias' impact on decision-making in a social context. Here, we investigate the conditions under which confirmation bias is beneficial or detrimental to decision-making under social influence. To do so, we develop a Collective Asymmetric Reinforcement Learning (CARL) model in which artificial agents observe others' actions and rewards, and update this information asymmetrically. We use agent-based simulations to study how confirmation bias affects collective performance on a two-armed bandit task, and how resource scarcity, group size and bias strength modulate this effect. We find that a confirmation bias benefits group learning across a wide range of resource-scarcity conditions. Moreover, we discover that, past a critical bias strength, resource abundance favors the emergence of two different performance regimes, one of which is suboptimal. In addition, we find that this regime bifurcation comes with polarization in small groups of agents. Overall, our results suggest the existence of an optimal, moderate level of confirmation bias for decision-making in a social context.
Collapse
Affiliation(s)
- Clémence Bergerot
- Department of Biology, Humboldt Universität zu Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany
| | - Wolfram Barfuss
- Transdisciplinary Research Area: Sustainable Futures, University of Bonn, Bonn, Germany
- Center for Development Research (ZEF), University of Bonn, Bonn, Germany
| | - Pawel Romanczuk
- Department of Biology, Humboldt Universität zu Berlin, Berlin, Germany
- Science of Intelligence, Research Cluster of Excellence, Berlin, Germany
| |
Collapse
|
12
|
Nussenbaum K, Katzman PL, Lu H, Zorowitz S, Hartley CA. Sensitivity to the Instrumental Value of Choice Increases Across Development. Psychol Sci 2024; 35:933-947. [PMID: 38900963 PMCID: PMC11693699 DOI: 10.1177/09567976241256961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 04/25/2024] [Indexed: 06/22/2024] Open
Abstract
Across development, people tend to demonstrate a preference for contexts in which they have the opportunity to make choices. However, it is not clear how children, adolescents, and adults learn to calibrate this preference based on the costs and benefits of agentic choice. Here, in both a primary, in-person, reinforcement-learning experiment (N = 92; age range = 10-25 years) and a preregistered online replication study (N = 150; age range = 8-25 years), we found that participants overvalued agentic choice but also calibrated their agency decisions to the reward structure of the environment, increasingly selecting agentic choice when choice had greater instrumental value. Regression analyses and computational modeling of participant choices revealed that participants' bias toward agentic choice-reflecting its intrinsic value-remained consistent across age, whereas sensitivity to the instrumental value of agentic choice increased from childhood to early adulthood.
Collapse
Affiliation(s)
- Kate Nussenbaum
- Department of Psychology, New York University
- Princeton Neuroscience Institute, Princeton University
| | | | - Hanxiao Lu
- Department of Psychology, New York University
| | | | - Catherine A. Hartley
- Department of Psychology, New York University
- Center for Neural Science, New York University
| |
Collapse
|
13
|
Koch C, Zika O, Bruckner R, Schuck NW. Influence of surprise on reinforcement learning in younger and older adults. PLoS Comput Biol 2024; 20:e1012331. [PMID: 39141681 PMCID: PMC11346965 DOI: 10.1371/journal.pcbi.1012331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 08/26/2024] [Accepted: 07/16/2024] [Indexed: 08/16/2024] Open
Abstract
Surprise is a key component of many learning experiences, and yet its precise computational role, and how it changes with age, remain debated. One major challenge is that surprise often occurs jointly with other variables, such as uncertainty and outcome probability. To assess how humans learn from surprising events, and whether aging affects this process, we studied choices while participants learned from bandits with either Gaussian or bi-modal outcome distributions, which decoupled outcome probability, uncertainty, and surprise. A total of 102 participants (51 older, aged 50-73; 51 younger, 19-30 years) chose between three bandits, one of which had a bimodal outcome distribution. Behavioral analyses showed that both age-groups learned the average of the bimodal bandit less well. A trial-by-trial analysis indicated that participants performed choice reversals immediately following large absolute prediction errors, consistent with heightened sensitivity to surprise. This effect was stronger in older adults. Computational models indicated that learning rates in younger as well as older adults were influenced by surprise, rather than uncertainty, but also suggested large interindividual variability in the process underlying learning in our task. Our work bridges between behavioral economics research that has focused on how outcomes with low probability affect choice in older adults, and reinforcement learning work that has investigated age differences in the effects of uncertainty and suggests that older adults overly adapt to surprising events, even when accounting for probability and uncertainty effects.
Collapse
Affiliation(s)
- Christoph Koch
- Max Planck Institute for Human Development, Berlin, Germany
- Institute of Psychology, Universität Hamburg, Hamburg, Germany
| | - Ondrej Zika
- Max Planck Institute for Human Development, Berlin, Germany
- Max Planck UCL Centre for Computational Psychiatry and Aging Research, Berlin, Germany, and London, United Kingdom
| | - Rasmus Bruckner
- Max Planck Institute for Human Development, Berlin, Germany
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
| | - Nicolas W. Schuck
- Max Planck Institute for Human Development, Berlin, Germany
- Institute of Psychology, Universität Hamburg, Hamburg, Germany
- Max Planck UCL Centre for Computational Psychiatry and Aging Research, Berlin, Germany, and London, United Kingdom
| |
Collapse
|
14
|
Anlló H, Bavard S, Benmarrakchi F, Bonagura D, Cerrotti F, Cicue M, Gueguen M, Guzmán EJ, Kadieva D, Kobayashi M, Lukumon G, Sartorio M, Yang J, Zinchenko O, Bahrami B, Silva Concha J, Hertz U, Konova AB, Li J, O'Madagain C, Navajas J, Reyes G, Sarabi-Jamab A, Shestakova A, Sukumaran B, Watanabe K, Palminteri S. Comparing experience- and description-based economic preferences across 11 countries. Nat Hum Behav 2024; 8:1554-1567. [PMID: 38877287 DOI: 10.1038/s41562-024-01894-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 04/19/2024] [Indexed: 06/16/2024]
Abstract
Recent evidence indicates that reward value encoding in humans is highly context dependent, leading to suboptimal decisions in some cases, but whether this computational constraint on valuation is a shared feature of human cognition remains unknown. Here we studied the behaviour of n = 561 individuals from 11 countries of markedly different socioeconomic and cultural makeup. Our findings show that context sensitivity was present in all 11 countries. Suboptimal decisions generated by context manipulation were not explained by risk aversion, as estimated through a separate description-based choice task (that is, lotteries) consisting of matched decision offers. Conversely, risk aversion significantly differed across countries. Overall, our findings suggest that context-dependent reward value encoding is a feature of human cognition that remains consistently present across different countries, as opposed to description-based decision-making, which is more permeable to cultural factors.
Collapse
Affiliation(s)
- Hernán Anlló
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France.
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- Intercultural Cognitive Network, Paris, France.
| | - Sophie Bavard
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France
- Intercultural Cognitive Network, Paris, France
- General Psychology Lab, Hamburg University, Hamburg, Germany
| | - FatimaEzzahra Benmarrakchi
- Intercultural Cognitive Network, Paris, France
- School of Collective Intelligence, Université Mohammed VI Polytechnique, Rabat, Morocco
| | - Darla Bonagura
- Intercultural Cognitive Network, Paris, France
- Department of Psychiatry, University Behavioral Health Care and Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ, USA
| | - Fabien Cerrotti
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France
- Intercultural Cognitive Network, Paris, France
| | - Mirona Cicue
- Department of Cognitive Sciences, University of Haifa, Haifa, Israel
| | - Maelle Gueguen
- Intercultural Cognitive Network, Paris, France
- Department of Psychiatry, University Behavioral Health Care and Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ, USA
| | - Eugenio José Guzmán
- Facultad de Psicología, Universidad del Desarrollo, Santiago de Chile, Chile
| | - Dzerassa Kadieva
- International Laboratory for Social Neurobiology, Institute for Cognitive Neuroscience, HSE University, Moscow, Russia
| | - Maiko Kobayashi
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Gafari Lukumon
- School of Collective Intelligence, Université Mohammed VI Polytechnique, Rabat, Morocco
| | - Marco Sartorio
- Laboratorio de Neurociencia, Universidad Torcuato Di Tella, Buenos Aires, Argentina
| | - Jiong Yang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Oksana Zinchenko
- Intercultural Cognitive Network, Paris, France
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, Moscow, Russia
| | - Bahador Bahrami
- Intercultural Cognitive Network, Paris, France
- Department of Psychology, Ludwig Maximilian University, Munich, Germany
| | - Jaime Silva Concha
- Intercultural Cognitive Network, Paris, France
- Facultad de Psicología, Universidad del Desarrollo, Santiago de Chile, Chile
| | - Uri Hertz
- Intercultural Cognitive Network, Paris, France
- Department of Cognitive Sciences, University of Haifa, Haifa, Israel
| | - Anna B Konova
- Intercultural Cognitive Network, Paris, France
- Department of Psychiatry, University Behavioral Health Care and Brain Health Institute, Rutgers University-New Brunswick, Piscataway, NJ, USA
| | - Jian Li
- Intercultural Cognitive Network, Paris, France
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
| | - Cathal O'Madagain
- Intercultural Cognitive Network, Paris, France
- School of Collective Intelligence, Université Mohammed VI Polytechnique, Rabat, Morocco
| | - Joaquin Navajas
- Intercultural Cognitive Network, Paris, France
- Laboratorio de Neurociencia, Universidad Torcuato Di Tella, Buenos Aires, Argentina
- Escuela de Negocios, Universidad Torcuato Di Tella, Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Gabriel Reyes
- Intercultural Cognitive Network, Paris, France
- Facultad de Psicología, Universidad del Desarrollo, Santiago de Chile, Chile
| | - Atiye Sarabi-Jamab
- Intercultural Cognitive Network, Paris, France
- School of Cognitive Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Anna Shestakova
- Intercultural Cognitive Network, Paris, France
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, Moscow, Russia
| | - Bhasi Sukumaran
- Intercultural Cognitive Network, Paris, France
- Department of Clinical Psychology, SRM Medical College Hospital and Research Centre, Chennai, India
| | - Katsumi Watanabe
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
- Intercultural Cognitive Network, Paris, France
| | - Stefano Palminteri
- Human Reinforcement Learning Team, Laboratory of Cognitive and Computational Neuroscience, Paris, France.
- Intercultural Cognitive Network, Paris, France.
- Departement d'études cognitives, Ecole normale supérieure, PSL Research University, Paris, France.
| |
Collapse
|
15
|
Vodret M. Irreversibility in belief dynamics: Unraveling the link to cognitive effort. Phys Rev E 2024; 110:014304. [PMID: 39160952 DOI: 10.1103/physreve.110.014304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 06/25/2024] [Indexed: 08/21/2024]
Abstract
The relationship between time irreversibility in neuronal dynamics and cognitive effort is a subject of growing interest in the scientific literature. Although correlations between proxies of both concepts have been experimentally observed, the underlying precise linkage between them remains elusive. Here we investigate the case of learning in decision-making tasks; we do so by introducing a thermodynamically grounded metric-inspired by Landauer's principle-which connects time-irreversible information processing to energy consumption. Equipped with this metric, we investigate the role of macroscopic time-reversal symmetry breaking in belief dynamics for the case of an agent with finite sensitivity while performing a static two-armed bandit task-a standard setup in cognitive neuroscience. To gain insights into the belief dynamics, we analogize it to the dynamics of an active particle subject to state-dependent noise and living in a two-dimensional space. This mapping allows an analytical description of learning-induced biases. We deeply explore the case of Q-learning with forgetting the nonchosen option. In this case, learning-induced risk aversion is formally equivalent to standard thermophoresis, i.e., the net motion towards low-temperature regions. Finally, we quantify the irreversibility of belief dynamics in the steady state for different bandit configurations, sensitivity levels, and exploitative behavior. We found a strong correlation in high-sensitivity learning between heightened irreversibility in belief dynamics and improved decision-making outcomes. Notably, as the task's difficulty increases, a greater degree of irreversibility in belief dynamics becomes necessary for having superior performances; this explicitly unravels a plausible connection between time irreversibility and cognitive effort. In conclusion, our investigation reveals that irreversibility in belief dynamics bridges out-of-equilibrium statistical physics concepts and cognitive neuroscience. In decision-making contexts, this perspective offers insights into the notion of cognitive effort, suggesting a potential mechanism driving the evolution of living systems toward out-of-equilibrium structures.
Collapse
|
16
|
Shimomura K, Morita K, Nishiguchi Y, Huffman JC, Millstein RA. Intraindividual Fluctuation in Optimism Under Daily Life Circumstances: A Longitudinal Study. AFFECTIVE SCIENCE 2024; 5:1-12. [PMID: 39050035 PMCID: PMC11264638 DOI: 10.1007/s42761-023-00224-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 10/29/2023] [Indexed: 07/27/2024]
Abstract
Optimism is typically conceptualized as a relatively static tendency regarding positive expectations about one's future. However, recent studies suggest that optimism may meaningfully fluctuate within individuals over time. To date, little is known about the characteristics of such state optimism and potential cultural difference in state optimism. Accordingly, we developed a Japanese version of the State Optimism Measure (J-SOM) and examined its validity and the nature of intraindividual state optimism fluctuations; we also examined relationships between the J-SOM and other measures of mental health, including trait optimism. We conducted two online longitudinal surveys with different time intervals (weekly, n = 97; monthly, n = 99) targeting university students. Results were largely consistent between the two surveys. We confirmed high factor validity and internal consistency of the J-SOM. The J-SOM showed significant correlations in expected directions with other measures such as depressive mood and subjective happiness. In addition, intraindividual changes in the J-SOM were associated with changes in mood and quality of daily life. Importantly, these associations between intraindividual change in optimism and in other variables were minimal for trait optimism. We also found that state optimism, compared with trait optimism, tended to show larger intraindividual changes over 1, 2, 3, 4, and 8 weeks. In summary, this study developed a translated version of the SOM and validated it, and then showed, for the first time, that state optimism can fluctuate within individuals in daily life over a span of several weeks. Supplementary Information The online version contains supplementary material available at 10.1007/s42761-023-00224-y.
Collapse
Affiliation(s)
- Kanji Shimomura
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan
| | | | - Jeff C. Huffman
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| | - Rachel A. Millstein
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA USA
- Harvard Medical School, Boston, MA USA
| |
Collapse
|
17
|
Avila Chauvet L, Mejía Cruz D. Computational modeling of decision-making in substance abusers: testing Bechara's hypotheses. Front Psychol 2024; 15:1281082. [PMID: 38882514 PMCID: PMC11178135 DOI: 10.3389/fpsyg.2024.1281082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 04/29/2024] [Indexed: 06/18/2024] Open
Abstract
One of the cognitive abilities most affected by substance abuse is decision-making. Behavioral tasks such as the Iowa Gambling Task (IGT) provide a means to measure the learning process involved in decision-making. To comprehend this process, three hypotheses have emerged: (1) participants prioritize gains over losses, (2) they exhibit insensitivity to losses, and (3) the capacity of operational storage or working memory comes into play. A dynamic model was developed to examine these hypotheses, simulating sensitivity to gains and losses. The Linear Operator model served as the learning rule, wherein net gains depend on the ratio of gains to losses, weighted by the sensitivity to both. The study further proposes a comparison between the performance of simulated agents and that of substance abusers (n = 20) and control adults (n = 20). The findings indicate that as the memory factor increases, along with high sensitivity to losses and low sensitivity to gains, agents prefer advantageous alternatives, particularly those with a lower frequency of punishments. Conversely, when sensitivity to gains increases and the memory factor decreases, agents prefer disadvantageous alternatives, especially those that result in larger losses. Human participants confirmed the agents' performance, particularly when contrasting optimal and sub-optimal outcomes. In conclusion, we emphasize the importance of evaluating the parameters of the linear operator model across diverse clinical and community samples.
Collapse
Affiliation(s)
| | - Diana Mejía Cruz
- Psychology Department, Sonora Institute of Technology, Obregon City, Sonora, Mexico
| |
Collapse
|
18
|
Cowan RL, Davis T, Kundu B, Rahimpour S, Rolston JD, Smith EH. More widespread and rigid neuronal representation of reward expectation underlies impulsive choices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.588637. [PMID: 38645037 PMCID: PMC11030340 DOI: 10.1101/2024.04.11.588637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Impulsive choices prioritize smaller, more immediate rewards over larger, delayed, or potentially uncertain rewards. Impulsive choices are a critical aspect of substance use disorders and maladaptive decision-making across the lifespan. Here, we sought to understand the neuronal underpinnings of expected reward and risk estimation on a trial-by-trial basis during impulsive choices. To do so, we acquired electrical recordings from the human brain while participants carried out a risky decision-making task designed to measure choice impulsivity. Behaviorally, we found a reward-accuracy tradeoff, whereby more impulsive choosers were more accurate at the task, opting for a more immediate reward while compromising overall task performance. We then examined how neuronal populations across frontal, temporal, and limbic brain regions parametrically encoded reinforcement learning model variables, namely reward and risk expectation and surprise, across trials. We found more widespread representations of reward value expectation and prediction error in more impulsive choosers, whereas less impulsive choosers preferentially represented risk expectation. A regional analysis of reward and risk encoding highlighted the anterior cingulate cortex for value expectation, the anterior insula for risk expectation and surprise, and distinct regional encoding between impulsivity groups. Beyond describing trial-by-trial population neuronal representations of reward and risk variables, these results suggest impaired inhibitory control and model-free learning underpinnings of impulsive choice. These findings shed light on neural processes underlying reinforced learning and decision-making in uncertain environments and how these processes may function in psychiatric disorders.
Collapse
Affiliation(s)
- Rhiannon L Cowan
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| | - Tyler Davis
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| | - Bornali Kundu
- Department of Neurosurgery, University of Missouri, Columbia, MO 65212, USA
| | - Shervin Rahimpour
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| | - John D Rolston
- Department of Neurosurgery, Brigham & Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Elliot H Smith
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| |
Collapse
|
19
|
Karnick AT, Bauer BW, Capron DW. Negative mood and optimism bias: An experimental investigation of sadness and belief updating. J Behav Ther Exp Psychiatry 2024; 82:101910. [PMID: 37714798 DOI: 10.1016/j.jbtep.2023.101910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/18/2023] [Accepted: 09/02/2023] [Indexed: 09/17/2023]
Abstract
BACKGROUND AND OBJECTIVES Understanding how individuals integrate new information to form beliefs under changing emotional conditions is crucial to describing decision-making processes. Previous research suggests that although most people demonstrate bias toward optimistic appraisals of new information when updating beliefs, individuals with dysphoric psychiatric conditions (e.g., major depression) do not demonstrate this same bias. Despite these findings, limited research has investigated the relationship between affective states and belief updating processes. METHODS We induced neutral and sad moods in participants and had them complete a belief-updating paradigm by estimating the likelihood of negative future events happening to them, viewing the actual likelihood, and then re-estimating their perceived likelihood. RESULTS We observed that individuals updated their beliefs more after receiving desirable information relative to undesirable information under neutral conditions. Further, we found that individuals did not demonstrate unrealistic optimism under negative affective conditions. LIMITATIONS This study incorporated a population of university students under laboratory conditions and would benefit from replication and extension in clinical populations and naturalistic settings. CONCLUSIONS These findings suggest that momentary fluctuations in mood affect how individuals integrate information to form beliefs.
Collapse
Affiliation(s)
- Aleksandr T Karnick
- Department of Psychology, University of Southern Mississippi, Hattiesburg, MS, USA.
| | - Brian W Bauer
- Department of Psychology, University of Southern Mississippi, Hattiesburg, MS, USA; Department of Psychology, University of Georgia, Athens, GA, USA
| | - Daniel W Capron
- Department of Psychology, University of Southern Mississippi, Hattiesburg, MS, USA
| |
Collapse
|
20
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
21
|
Simoens J, Verguts T, Braem S. Learning environment-specific learning rates. PLoS Comput Biol 2024; 20:e1011978. [PMID: 38517916 PMCID: PMC10990245 DOI: 10.1371/journal.pcbi.1011978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 04/03/2024] [Accepted: 03/09/2024] [Indexed: 03/24/2024] Open
Abstract
People often have to switch back and forth between different environments that come with different problems and volatilities. While volatile environments require fast learning (i.e., high learning rates), stable environments call for lower learning rates. Previous studies have shown that people adapt their learning rates, but it remains unclear whether they can also learn about environment-specific learning rates, and instantaneously retrieve them when revisiting environments. Here, using optimality simulations and hierarchical Bayesian analyses across three experiments, we show that people can learn to use different learning rates when switching back and forth between two different environments. We even observe a signature of these environment-specific learning rates when the volatility of both environments is suddenly the same. We conclude that humans can flexibly adapt and learn to associate different learning rates to different environments, offering important insights for developing theories of meta-learning and context-specific control.
Collapse
Affiliation(s)
- Jonas Simoens
- Department of Experimental Psychology, Ghent University, Belgium
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Belgium
| | - Senne Braem
- Department of Experimental Psychology, Ghent University, Belgium
| |
Collapse
|
22
|
Jin F, Yang L, Yang L, Li J, Li M, Shang Z. Dynamics Learning Rate Bias in Pigeons: Insights from Reinforcement Learning and Neural Correlates. Animals (Basel) 2024; 14:489. [PMID: 38338131 PMCID: PMC10854969 DOI: 10.3390/ani14030489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 01/23/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
Research in reinforcement learning indicates that animals respond differently to positive and negative reward prediction errors, which can be calculated by assuming learning rate bias. Many studies have shown that humans and other animals have learning rate bias during learning, but it is unclear whether and how the bias changes throughout the entire learning process. Here, we recorded the behavior data and the local field potentials (LFPs) in the striatum of five pigeons performing a probabilistic learning task. Reinforcement learning models with and without learning rate biases were used to dynamically fit the pigeons' choice behavior and estimate the option values. Furthemore, the correlation between the striatal LFPs power and the model-estimated option values was explored. We found that the pigeons' learning rate bias shifted from negative to positive during the learning process, and the striatal Gamma (31 to 80 Hz) power correlated with the option values modulated by dynamic learning rate bias. In conclusion, our results support the hypothesis that pigeons employ a dynamic learning strategy in the learning process from both behavioral and neural aspects, providing valuable insights into reinforcement learning mechanisms of non-human animals.
Collapse
Affiliation(s)
- Fuli Jin
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Lifang Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Long Yang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Jiajia Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Mengmeng Li
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Zhigang Shang
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China; (F.J.); (L.Y.); (L.Y.); (J.L.)
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
- Institute of Medical Engineering Technology and Data Mining, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
23
|
Ting CC, Salem-Garcia N, Palminteri S, Engelmann JB, Lebreton M. Neural and computational underpinnings of biased confidence in human reinforcement learning. Nat Commun 2023; 14:6896. [PMID: 37898640 PMCID: PMC10613217 DOI: 10.1038/s41467-023-42589-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/16/2023] [Indexed: 10/30/2023] Open
Abstract
While navigating a fundamentally uncertain world, humans and animals constantly evaluate the probability of their decisions, actions or statements being correct. When explicitly elicited, these confidence estimates typically correlates positively with neural activity in a ventromedial-prefrontal (VMPFC) network and negatively in a dorsolateral and dorsomedial prefrontal network. Here, combining fMRI with a reinforcement-learning paradigm, we leverage the fact that humans are more confident in their choices when seeking gains than avoiding losses to reveal a functional dissociation: whereas the dorsal prefrontal network correlates negatively with a condition-specific confidence signal, the VMPFC network positively encodes task-wide confidence signal incorporating the valence-induced bias. Challenging dominant neuro-computational models, we found that decision-related VMPFC activity better correlates with confidence than with option-values inferred from reinforcement-learning models. Altogether, these results identify the VMPFC as a key node in the neuro-computational architecture that builds global feeling-of-confidence signals from latent decision variables and contextual biases during reinforcement-learning.
Collapse
Affiliation(s)
- Chih-Chung Ting
- General Psychology, Universität Hamburg, Von-Melle-Park 11, 20146, Hamburg, Germany.
- CREED, Amsterdam School of Economics (ASE), Universiteit van Amsterdam, Roetersstraat 11, 1018 WB, Amsterdam, the Netherlands.
| | - Nahuel Salem-Garcia
- Swiss Center for Affective Science, Faculty of Psychology and Educational Sciences, University of Geneva, Chem. des Mines 9, 1202, Genève, Switzerland
| | - Stefano Palminteri
- Département d'Études Cognitives, École Normale Supérieure, PSL Research University, 29 rue d'Ulm, 75230, Paris cedex 05, France
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et de la Recherche Médicale, 29 rue d'Ulm 75230, Paris cedex 05, France
| | - Jan B Engelmann
- CREED, Amsterdam School of Economics (ASE), Universiteit van Amsterdam, Roetersstraat 11, 1018 WB, Amsterdam, the Netherlands.
- The Tinbergen Institute, Gustav Mahlerplein 117, 1082 MS, Amsterdam, the Netherlands.
| | - Maël Lebreton
- Swiss Center for Affective Science, Faculty of Psychology and Educational Sciences, University of Geneva, Chem. des Mines 9, 1202, Genève, Switzerland.
- Economics of Human Behavior group, Paris-Jourdan Sciences Économiques UMR8545, Paris School of Economics, 48 Boulevard Jourdan, 75014, Paris, France.
| |
Collapse
|
24
|
Ben-Artzi I, Kessler Y, Nicenboim B, Shahar N. Computational mechanisms underlying latent value updating of unchosen actions. SCIENCE ADVANCES 2023; 9:eadi2704. [PMID: 37862419 PMCID: PMC10588947 DOI: 10.1126/sciadv.adi2704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 09/20/2023] [Indexed: 10/22/2023]
Abstract
Current studies suggest that individuals estimate the value of their choices based on observed feedback. Here, we ask whether individuals also update the value of their unchosen actions, even when the associated feedback remains unknown. One hundred seventy-eight individuals completed a multi-armed bandit task, making choices to gain rewards. We found robust evidence suggesting latent value updating of unchosen actions based on the chosen action's outcome. Computational modeling results suggested that this effect is mainly explained by a value updating mechanism whereby individuals integrate the outcome history for choosing an option with that of rejecting the alternative. Properties of the deliberation (i.e., duration/difficulty) did not moderate the latent value updating of unchosen actions, suggesting that memory traces generated during deliberation might take a smaller role in this specific phenomenon than previously thought. We discuss the mechanisms facilitating credit assignment to unchosen actions and their implications for human decision-making.
Collapse
Affiliation(s)
- Ido Ben-Artzi
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- Minducate Science of Learning Research and Innovation Center of the Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Yoav Kessler
- Department of Psychology and School of Brain Sciences and Cognition, Ben Gurion University of the Negev, Be'er Sheva, Israel
| | - Bruno Nicenboim
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, Netherlands
| | - Nitzan Shahar
- School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
25
|
Pupillo F, Bruckner R. Signed and unsigned effects of prediction error on memory: Is it a matter of choice? Neurosci Biobehav Rev 2023; 153:105371. [PMID: 37633626 DOI: 10.1016/j.neubiorev.2023.105371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
Adaptive decision-making is governed by at least two types of memory processes. On the one hand, learned predictions through integrating multiple experiences, and on the other hand, one-shot episodic memories. These two processes interact, and predictions - particularly prediction errors - influence how episodic memories are encoded. However, studies using computational models disagree on the exact shape of this relationship, with some findings showing an effect of signed prediction errors and others showing an effect of unsigned prediction errors on episodic memory. We argue that the choice-confirmation bias, which reflects stronger learning from choice-confirming compared to disconfirming outcomes, could explain these seemingly diverging results. Our perspective implies that the influence of prediction errors on episodic encoding critically depends on whether people can freely choose between options (i.e., instrumental learning tasks) or not (Pavlovian learning tasks). The choice-confirmation bias on memory encoding might have evolved to prioritize memory representations that optimize reward-guided decision-making. We conclude by discussing open issues and implications for future studies.
Collapse
Affiliation(s)
- Francesco Pupillo
- Department of Psychology, Goethe-Universität Frankfurt, Germany; Tilburg School of Social and Behavioral Sciences, Tilburg University, Netherlands.
| | - Rasmus Bruckner
- Department of Education and Psychology, Freie Universität Berlin, Germany; Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
26
|
Peng XR, Bundil I, Schulreich S, Li SC. Neural correlates of valence-dependent belief and value updating during uncertainty reduction: An fNIRS study. Neuroimage 2023; 279:120327. [PMID: 37582418 DOI: 10.1016/j.neuroimage.2023.120327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 08/07/2023] [Accepted: 08/11/2023] [Indexed: 08/17/2023] Open
Abstract
Selective use of new information is crucial for adaptive decision-making. Combining a gamble bidding task with assessing cortical responses using functional near-infrared spectroscopy (fNIRS), we investigated potential effects of information valence on behavioral and neural processes of belief and value updating during uncertainty reduction in young adults. By modeling changes in the participants' expressed subjective values using a Bayesian model, we dissociated processes of (i) updating beliefs about statistical properties of the gamble, (ii) updating values of a gamble based on new information about its winning probabilities, as well as (iii) expectancy violation. The results showed that participants used new information to update their beliefs and values about the gambles in a quasi-optimal manner, as reflected in the selective updating only in situations with reducible uncertainty. Furthermore, their updating was valence-dependent: information indicating an increase in winning probability was underweighted, whereas information about a decrease in winning probability was updated in good agreement with predictions of the Bayesian decision theory. Results of model-based and moderation analyses showed that this valence-dependent asymmetry was associated with a distinct contribution of expectancy violation, besides belief updating, to value updating after experiencing new positive information regarding winning probabilities. In line with the behavioral results, we replicated previous findings showing involvements of frontoparietal brain regions in the different components of updating. Furthermore, this study provided novel results suggesting a valence-dependent recruitment of brain regions. Individuals with stronger oxyhemoglobin responses during value updating was more in line with predictions of the Bayesian model while integrating new information that indicates an increase in winning probability. Taken together, this study provides first results showing expectancy violation as a contributing factor to sub-optimal valence-dependent updating during uncertainty reduction and suggests limitations of normative Bayesian decision theory.
Collapse
Affiliation(s)
- Xue-Rui Peng
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop, Technische Universität Dresden, Dresden, Germany.
| | - Indra Bundil
- Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University, Cardiff, United Kingdom
| | - Stefan Schulreich
- Department of Nutritional Sciences, Faculty of Life Sciences, University of Vienna, Vienna, Austria; Department of Cognitive Psychology, Faculty of Psychology and Human Movement Science, Universität Hamburg, Hamburg, Germany
| | - Shu-Chen Li
- Chair of Lifespan Developmental Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop, Technische Universität Dresden, Dresden, Germany.
| |
Collapse
|
27
|
Garrett N, Sharot T. There is no belief update bias for neutral events: failure to replicate Burton et al. (2022). JOURNAL OF COGNITIVE PSYCHOLOGY 2023; 35:876-886. [PMID: 38013976 PMCID: PMC10591604 DOI: 10.1080/20445911.2023.2245112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 08/01/2023] [Indexed: 11/29/2023]
Abstract
In a recent paper, Burton et al. claim that individuals update beliefs to a greater extent when learning an event is less likely compared to more likely than expected. Here, we investigate Burton's et al.'s, findings. First, we show how Burton et al.'s data do not in fact support a belief update bias for neutral events. Next, in an attempt to replicate their findings, we collect a new data set employing the original belief update task design, but with neutral events. A belief update bias for neutral events is not observed. Finally, we highlight the statistical errors and confounds in Burton et al.'s design and analysis. This includes mis-specifying a reinforcement learning approach to model the data and failing to follow standard computational model fitting sanity checks such as parameter recovery, model comparison and out of sample prediction. Together, the results find little evidence for biased updating for neutral events.
Collapse
Affiliation(s)
- Neil Garrett
- School of Psychology, University of East Anglia, Norwich, UK
| | - Tali Sharot
- Affective Brain Lab, Department of Experimental Psychology, University College London, London, UK
- The Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
28
|
Ni Y, Sun J, Li J. The shadowing effect of initial expectation on learning asymmetry. PLoS Comput Biol 2023; 19:e1010751. [PMID: 37486955 PMCID: PMC10399892 DOI: 10.1371/journal.pcbi.1010751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 08/03/2023] [Accepted: 07/04/2023] [Indexed: 07/26/2023] Open
Abstract
Evidence for positivity and optimism bias abounds in high-level belief updates. However, no consensus has been reached regarding whether learning asymmetries exist in more elementary forms of updates such as reinforcement learning (RL). In RL, the learning asymmetry concerns the sensitivity difference in incorporating positive and negative prediction errors (PE) into value estimation, namely the asymmetry of learning rates associated with positive and negative PEs. Although RL has been established as a canonical framework in characterizing interactions between agent and environment, the direction of learning asymmetry remains controversial. Here, we propose that part of the controversy stems from the fact that people may have different value expectations before entering the learning environment. Such a default value expectation influences how PEs are calculated and consequently biases subjects' choices. We test this hypothesis in two learning experiments with stable or varying reinforcement probabilities, across monetary gains, losses, and gain-loss mixed environments. Our results consistently support the model incorporating both asymmetric learning rates and the initial value expectation, highlighting the role of initial expectation in value updating and choice preference. Further simulation and model parameter recovery analyses confirm the unique contribution of initial value expectation in accessing learning rate asymmetry.
Collapse
Affiliation(s)
- Yinmei Ni
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
| | - Jingwei Sun
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- Lenovo Research, Lenovo Group, Beijing, China
| | - Jian Li
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
| |
Collapse
|
29
|
Towner E, Chierchia G, Blakemore SJ. Sensitivity and specificity in affective and social learning in adolescence. Trends Cogn Sci 2023:S1364-6613(23)00092-X. [PMID: 37198089 DOI: 10.1016/j.tics.2023.04.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 03/23/2023] [Accepted: 04/05/2023] [Indexed: 05/19/2023]
Abstract
Adolescence is a period of heightened affective and social sensitivity. In this review we address how this increased sensitivity influences associative learning. Based on recent evidence from human and rodent studies, as well as advances in computational biology, we suggest that, compared to other age groups, adolescents show features of heightened Pavlovian learning but tend to perform worse than adults at instrumental learning. Because Pavlovian learning does not involve decision-making, whereas instrumental learning does, we propose that these developmental differences might be due to heightened sensitivity to rewards and threats in adolescence, coupled with a lower specificity of responding. We discuss the implications of these findings for adolescent mental health and education.
Collapse
Affiliation(s)
- Emily Towner
- Department of Psychology, University of Cambridge, Downing Street, Cambridge, UK.
| | - Gabriele Chierchia
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy; Department of Psychology, University of Cambridge, Downing Street, Cambridge, UK
| | | |
Collapse
|
30
|
Chierchia G, Soukupová M, Kilford EJ, Griffin C, Leung J, Palminteri S, Blakemore SJ. Confirmatory reinforcement learning changes with age during adolescence. Dev Sci 2023; 26:e13330. [PMID: 36194156 PMCID: PMC7615280 DOI: 10.1111/desc.13330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 07/26/2022] [Accepted: 09/20/2022] [Indexed: 11/26/2022]
Abstract
Understanding how learning changes during human development has been one of the long-standing objectives of developmental science. Recently, advances in computational biology have demonstrated that humans display a bias when learning to navigate novel environments through rewards and punishments: they learn more from outcomes that confirm their expectations than from outcomes that disconfirm them. Here, we ask whether confirmatory learning is stable across development, or whether it might be attenuated in developmental stages in which exploration is beneficial, such as in adolescence. In a reinforcement learning (RL) task, 77 participants aged 11-32 years (four men, mean age = 16.26) attempted to maximize monetary rewards by repeatedly sampling different pairs of novel options, which varied in their reward/punishment probabilities. Mixed-effect models showed an age-related increase in accuracy as long as learning contingencies remained stable across trials, but less so when they reversed halfway through the trials. Age was also associated with a greater tendency to stay with an option that had just delivered a reward, more than to switch away from an option that had just delivered a punishment. At the computational level, a confirmation model provided increasingly better fit with age. This model showed that age differences are captured by decreases in noise or exploration, rather than in the magnitude of the confirmation bias. These findings provide new insights into how learning changes during development and could help better tailor learning environments to people of different ages. RESEARCH HIGHLIGHTS: Reinforcement learning shows age-related improvement during adolescence, but more in stable learning environments compared with volatile learning environments. People tend to stay with an option after a win more than they shift from an option after a loss, and this asymmetry increases with age during adolescence. Computationally, these changes are captured by a developing confirmatory learning style, in which people learn more from outcomes that confirm rather than disconfirm their choices. Age-related differences in confirmatory learning are explained by decreases in stochasticity, rather than changes in the magnitude of the confirmation bias.
Collapse
Affiliation(s)
- Gabriele Chierchia
- Department of Psychology, University of Cambridge, UK
- Institute of Cognitive Neuroscience, University College London, UK
| | | | - Emma J. Kilford
- Institute of Cognitive Neuroscience, University College London, UK
- Department of Clinical, Educational and Health Psychology, University College London, UK
| | - Cait Griffin
- Institute of Cognitive Neuroscience, University College London, UK
| | - Jovita Leung
- Institute of Cognitive Neuroscience, University College London, UK
| | - Stefano Palminteri
- Institute of Cognitive Neuroscience, University College London, UK
- Department of Cognitive Science, École Normale Supérieure, FR
- Institute of Cognitive Neuroscience, HSE, Moscow, Federation of Russia
| | - Sarah-Jayne Blakemore
- Department of Psychology, University of Cambridge, UK
- Institute of Cognitive Neuroscience, University College London, UK
| |
Collapse
|
31
|
Sandhu TR, Xiao B, Lawson RP. Transdiagnostic computations of uncertainty: towards a new lens on intolerance of uncertainty. Neurosci Biobehav Rev 2023; 148:105123. [PMID: 36914079 DOI: 10.1016/j.neubiorev.2023.105123] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/21/2023] [Accepted: 03/08/2023] [Indexed: 03/13/2023]
Abstract
People radically differ in how they cope with uncertainty. Clinical researchers describe a dispositional characteristic known as "intolerance of uncertainty", a tendency to find uncertainty aversive, reported to be elevated across psychiatric and neurodevelopmental conditions. Concurrently, recent research in computational psychiatry has leveraged theoretical work to characterise individual differences in uncertainty processing. Under this framework, differences in how people estimate different forms of uncertainty can contribute to mental health difficulties. In this review, we briefly outline the concept of intolerance of uncertainty within its clinical context, and we argue that the mechanisms underlying this construct may be further elucidated through modelling how individuals make inferences about uncertainty. We will review the evidence linking psychopathology to different computationally specified forms of uncertainty and consider how these findings might suggest distinct mechanistic routes towards intolerance of uncertainty. We also discuss the implications of this computational approach for behavioural and pharmacological interventions, as well as the importance of different cognitive domains and subjective experiences in studying uncertainty processing.
Collapse
Affiliation(s)
- Timothy R Sandhu
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK; MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF, UK.
| | - Bowen Xiao
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK
| | - Rebecca P Lawson
- Department of Psychology, Downing Place, University of Cambridge, CB2 3EB, UK; MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF, UK
| |
Collapse
|
32
|
Goldway N, Eldar E, Shoval G, Hartley CA. Computational Mechanisms of Addiction and Anxiety: A Developmental Perspective. Biol Psychiatry 2023; 93:739-750. [PMID: 36775050 PMCID: PMC10038924 DOI: 10.1016/j.biopsych.2023.02.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 02/05/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
A central goal of computational psychiatry is to identify systematic relationships between transdiagnostic dimensions of psychiatric symptomatology and the latent learning and decision-making computations that inform individuals' thoughts, feelings, and choices. Most psychiatric disorders emerge prior to adulthood, yet little work has extended these computational approaches to study the development of psychopathology. Here, we lay out a roadmap for future studies implementing this approach by developing empirically and theoretically informed hypotheses about how developmental changes in model-based control of action and Pavlovian learning processes may modulate vulnerability to anxiety and addiction. We highlight how insights from studies leveraging computational approaches to characterize the normative developmental trajectories of clinically relevant learning and decision-making processes may suggest promising avenues for future developmental computational psychiatry research.
Collapse
Affiliation(s)
- Noam Goldway
- Department of Psychology, New York University, New York, New York
| | - Eran Eldar
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel; Department of Cognitive and Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Gal Shoval
- Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey; Child and Adolescent Division, Geha Mental Health Center, Petah Tikva, Israel; Department of Psychiatry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Catherine A Hartley
- Department of Psychology, New York University, New York, New York; Center for Neural Science, New York University, New York, New York.
| |
Collapse
|
33
|
Balliet D, Lindström B. Inferences about interdependence shape cooperation. Trends Cogn Sci 2023; 27:583-595. [PMID: 37055313 DOI: 10.1016/j.tics.2023.03.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 03/13/2023] [Accepted: 03/14/2023] [Indexed: 04/15/2023]
Abstract
During social interactions in daily life, people possess imperfect knowledge of their interdependence (i.e., how behaviors affect each person's outcomes), and what people infer about their interdependence can shape their behaviors. We review theory and research that suggests people can infer their interdependence with others along several dimensions, including mutual dependence, power, and corresponding-versus-conflicting interests. We discuss how perceptions of interdependence affect how people cooperate and punish others' defection in everyday life. We propose that people understand their interdependence with others through knowledge of the action space, cues during social interactions (e.g., partner behaviors), and priors based on experience. Finally, we describe how learning interdependence could occur through domain-specific and domain-general mechanisms.
Collapse
Affiliation(s)
- Daniel Balliet
- Department of Experimental and Applied Psychology, Institute for Brain and Behaviour Amsterdam (IBBA), Vrije Universiteit Amsterdam, Amsterdam 1081BT, The Netherlands.
| | - Björn Lindström
- Department of Experimental and Applied Psychology, Institute for Brain and Behaviour Amsterdam (IBBA), Vrije Universiteit Amsterdam, Amsterdam 1081BT, The Netherlands
| |
Collapse
|
34
|
Origins and consequences of mood flexibility: a computational perspective. Neurosci Biobehav Rev 2023; 147:105084. [PMID: 36764635 DOI: 10.1016/j.neubiorev.2023.105084] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/21/2023] [Accepted: 02/02/2023] [Indexed: 02/11/2023]
Abstract
A stable and neutral mood (euthymia) is commended by both economic and clinical perspectives, because it enables rational decisions and avoids mental illnesses. Here we suggest, on the contrary, that a flexible mood responsive to life events may be more adaptive for natural selection, because it can help adjust the behavior to fluctuations in the environment. In our model (dubbed MAGNETO), mood represents a global expected value that biases decisions to forage for a particular reward. When flexible, mood is updated every time an action is taken, by aggregating incurred costs and obtained rewards. Model simulations show that, across a large range of parameters, flexible agents outperform cold agents (with stable neutral mood), particularly when rewards and costs are correlated in time, as naturally occurring across seasons. However, with more extreme parameters, simulations generate short manic episodes marked by incessant foraging and lasting depressive episodes marked by persistent inaction. The MAGNETO model therefore accounts for both the function of mood fluctuations and the emergence of mood disorders.
Collapse
|
35
|
Villano WJ, Kraus NI, Reneau TR, Jaso BA, Otto AR, Heller AS. Individual differences in naturalistic learning link negative emotionality to the development of anxiety. SCIENCE ADVANCES 2023; 9:eadd2976. [PMID: 36598977 PMCID: PMC9812386 DOI: 10.1126/sciadv.add2976] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Organisms learn from prediction errors (PEs) to predict the future. Laboratory studies using small financial outcomes find that humans use PEs to update expectations and link individual differences in PE-based learning to internalizing disorders. Because of the low-stakes outcomes in most tasks, it is unclear whether PE learning emerges in naturalistic, high-stakes contexts and whether individual differences in PE learning predict psychopathology risk. Using experience sampling to assess 625 college students' expected exam grades, we found evidence of PE-based learning and a general tendency to discount negative PEs, an "optimism bias." However, individuals with elevated negative emotionality, a personality trait linked to the development of anxiety disorders, displayed a global pessimism and learning differences that impeded accurate expectations and predicted future anxiety symptoms. A sensitivity to PEs combined with an aversion to negative PEs may result in a pessimistic and inaccurate model of the world, leading to anxiety.
Collapse
Affiliation(s)
| | - Noah I. Kraus
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| | - Travis R. Reneau
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, MO, USA
| | - Brittany A. Jaso
- Center for Anxiety and Related Disorders, Boston University, Boston, MA, USA
| | - A. Ross Otto
- Department of Psychology, McGill University, Montreal, Canada
| | - Aaron S. Heller
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| |
Collapse
|
36
|
De Martino B, Cortese A. Goals, usefulness and abstraction in value-based choice. Trends Cogn Sci 2023; 27:65-80. [PMID: 36446707 DOI: 10.1016/j.tics.2022.11.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/26/2022] [Accepted: 11/01/2022] [Indexed: 11/27/2022]
Abstract
Colombian drug lord Pablo Escobar, while on the run, purportedly burned two million dollars in banknotes to keep his daughter warm. A stark reminder that, in life, circumstances and goals can quickly change, forcing us to reassess and modify our values on-the-fly. Studies in decision-making and neuroeconomics have often implicitly equated value to reward, emphasising the hedonic and automatic aspect of the value computation, while overlooking its functional (concept-like) nature. Here we outline the computational and biological principles that enable the brain to compute the usefulness of an option or action by creating abstractions that flexibly adapt to changing goals. We present different algorithmic architectures, comparing ideas from artificial intelligence (AI) and cognitive neuroscience with psychological theories and, when possible, drawing parallels.
Collapse
Affiliation(s)
- Benedetto De Martino
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK; Computational Neuroscience Laboratories, ATR Institute International, 619-0288 Kyoto, Japan.
| | - Aurelio Cortese
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK; Computational Neuroscience Laboratories, ATR Institute International, 619-0288 Kyoto, Japan.
| |
Collapse
|
37
|
Sugawara M, Katahira K. Choice perseverance underlies pursuing a hard-to-get target in an avatar choice task. Front Psychol 2022; 13:924578. [PMID: 36148109 PMCID: PMC9488557 DOI: 10.3389/fpsyg.2022.924578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 08/01/2022] [Indexed: 11/22/2022] Open
Abstract
People sometimes persistently pursue hard-to-get targets. Why people pursue such targets is unclear. Here, we hypothesized that choice perseverance, which is the tendency to repeat the same choice independent of the obtained outcomes, leads individuals to repeatedly choose a hard-to-get target, which consequently increases their preference for the target. To investigate this hypothesis, we conducted an online experiment involving an avatar choice task in which the participants repeatedly selected one avatar, and the selected avatar expressed their valence reactions through facial expressions and voice. We defined “hard-to-get” and “easy-to-get” avatars by manipulating the outcome probability such that the hard-to-get avatars rarely provided a positive reaction when selected, while the easy-to-get avatars frequently did. We found that some participants repeatedly selected hard-to-get avatars (Pursuit group). Based on a simulation, we found that higher choice perseverance accounted for the pursuit of hard-to-get avatars and that the Pursuit group had significantly higher choice perseverance than the No-pursuit group. Model fitting to the choice data also supported that choice perseverance can account for the pursuit of hard-to-get avatars in the Pursuit group. Moreover, we found that although baseline attractiveness was comparable among all avatars used in the choice task, the attractiveness of the hard-to-get avatars was significantly increased only in the Pursuit group. Taken together, we conclude that people with high choice perseverance pursue hard-to-get targets, rendering such targets more attractive. The tolerance for negative outcomes might be an important factor for succeeding in our lives but sometimes triggers problematic behavior, such as stalking. The present findings may contribute to understanding the psychological mechanisms of passion and perseverance for one’s long-term goals, which are more general than the romantic context imitated in avatar choice.
Collapse
Affiliation(s)
- Michiyo Sugawara
- Department of Cognitive and Psychological Sciences, Nagoya University, Nagoya, Japan
- Japan Society for the Promotion of Science, Chiyoda-ku, Japan
- Faculty of Letters, Arts and Sciences, Waseda University, Shinjuku-ku, Japan
- *Correspondence: Michiyo Sugawara,
| | - Kentaro Katahira
- Department of Cognitive and Psychological Sciences, Nagoya University, Nagoya, Japan
- National Institute of Advanced Industrial Science and Technology (AIST), Human Informatics and Interaction Research Institute, Tsukuba, Japan
| |
Collapse
|
38
|
Sakai Y, Sakai Y, Abe Y, Narumoto J, Tanaka SC. Memory trace imbalance in reinforcement and punishment systems can reinforce implicit choices leading to obsessive-compulsive behavior. Cell Rep 2022; 40:111275. [PMID: 36044850 DOI: 10.1016/j.celrep.2022.111275] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/09/2022] [Accepted: 08/05/2022] [Indexed: 11/03/2022] Open
Abstract
We may view most of our daily activities as rational action selections; however, we sometimes reinforce maladaptive behaviors despite having explicit environmental knowledge. In this study, we model obsessive-compulsive disorder (OCD) symptoms as implicitly learned maladaptive behaviors. Simulations in the reinforcement learning framework show that agents implicitly learn to respond to intrusive thoughts when the memory trace signal for past actions decays differently for positive and negative prediction errors. Moreover, this model extends our understanding of therapeutic effects of behavioral therapy in OCD. Using empirical data, we confirm that patients with OCD show extremely imbalanced traces, which are normalized by serotonin enhancers. We find that healthy participants also vary in their obsessive-compulsive tendencies, consistent with the degree of imbalanced traces. These behavioral characteristics can be generalized to variations in the healthy population beyond the spectrum of clinical phenotypes.
Collapse
Affiliation(s)
- Yuki Sakai
- ATR Brain Information Communication Research Laboratory Group, 2-2-2 Hikaridai Seika-Cho, Soraku-Gun, Kyoto 619-0288, Japan; Department of Psychiatry, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, 465 Kajii-Cho, Kawaramachi-Hirokoji, Kamigyo-Ku, Kyoto 602-8566, Japan
| | - Yutaka Sakai
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawa-Gakuen, Machida, Tokyo 194-8610, Japan
| | - Yoshinari Abe
- Department of Psychiatry, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, 465 Kajii-Cho, Kawaramachi-Hirokoji, Kamigyo-Ku, Kyoto 602-8566, Japan
| | - Jin Narumoto
- Department of Psychiatry, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, 465 Kajii-Cho, Kawaramachi-Hirokoji, Kamigyo-Ku, Kyoto 602-8566, Japan
| | - Saori C Tanaka
- ATR Brain Information Communication Research Laboratory Group, 2-2-2 Hikaridai Seika-Cho, Soraku-Gun, Kyoto 619-0288, Japan; Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-Cho, Ikoma, Nara 630-0192, Japan.
| |
Collapse
|