1
|
Gregorová K, Eldar E, Deserno L, Reiter AMF. A cognitive-computational account of mood swings in adolescence. Trends Cogn Sci 2024; 28:290-303. [PMID: 38503636 DOI: 10.1016/j.tics.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 02/06/2024] [Accepted: 02/12/2024] [Indexed: 03/21/2024]
Abstract
Teenagers have a reputation for being fickle, in both their choices and their moods. This variability may help adolescents as they begin to independently navigate novel environments. Recently, however, adolescent moodiness has also been linked to psychopathology. Here, we consider adolescents' mood swings from a novel computational perspective, grounded in reinforcement learning (RL). This model proposes that mood is determined by surprises about outcomes in the environment, and how much we learn from these surprises. It additionally suggests that mood biases learning and choice in a bidirectional manner. Integrating independent lines of research, we sketch a cognitive-computational account of how adolescents' mood, learning, and choice dynamics influence each other, with implications for normative and psychopathological development.
Collapse
Affiliation(s)
- Klára Gregorová
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital, Würzburg 97080, Germany; Department of Psychology, Julius-Maximilians-Universität, Würzburg 97070, Germany; German Center of Prevention Research on Mental Health, Würzburg 97080, Germany
| | - Eran Eldar
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive & Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel
| | - Lorenz Deserno
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital, Würzburg 97080, Germany; Department of Psychology, Julius-Maximilians-Universität, Würzburg 97070, Germany; Department of Cognitive & Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Psychiatry and Psychotherapy, Technical University of Dresden, Dresden 01069, Germany
| | - Andrea M F Reiter
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital, Würzburg 97080, Germany; Department of Psychology, Julius-Maximilians-Universität, Würzburg 97070, Germany; German Center of Prevention Research on Mental Health, Würzburg 97080, Germany; Collaborative Research Centre 940 Volition and Cognitive Control, Technical University of Dresden, Dresden 01069, Germany.
| |
Collapse
|
2
|
Karnick AT, Bauer BW, Capron DW. Negative mood and optimism bias: An experimental investigation of sadness and belief updating. J Behav Ther Exp Psychiatry 2024; 82:101910. [PMID: 37714798 DOI: 10.1016/j.jbtep.2023.101910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/18/2023] [Accepted: 09/02/2023] [Indexed: 09/17/2023]
Abstract
BACKGROUND AND OBJECTIVES Understanding how individuals integrate new information to form beliefs under changing emotional conditions is crucial to describing decision-making processes. Previous research suggests that although most people demonstrate bias toward optimistic appraisals of new information when updating beliefs, individuals with dysphoric psychiatric conditions (e.g., major depression) do not demonstrate this same bias. Despite these findings, limited research has investigated the relationship between affective states and belief updating processes. METHODS We induced neutral and sad moods in participants and had them complete a belief-updating paradigm by estimating the likelihood of negative future events happening to them, viewing the actual likelihood, and then re-estimating their perceived likelihood. RESULTS We observed that individuals updated their beliefs more after receiving desirable information relative to undesirable information under neutral conditions. Further, we found that individuals did not demonstrate unrealistic optimism under negative affective conditions. LIMITATIONS This study incorporated a population of university students under laboratory conditions and would benefit from replication and extension in clinical populations and naturalistic settings. CONCLUSIONS These findings suggest that momentary fluctuations in mood affect how individuals integrate information to form beliefs.
Collapse
Affiliation(s)
- Aleksandr T Karnick
- Department of Psychology, University of Southern Mississippi, Hattiesburg, MS, USA.
| | - Brian W Bauer
- Department of Psychology, University of Southern Mississippi, Hattiesburg, MS, USA; Department of Psychology, University of Georgia, Athens, GA, USA
| | - Daniel W Capron
- Department of Psychology, University of Southern Mississippi, Hattiesburg, MS, USA
| |
Collapse
|
3
|
Aster HC, Waltmann M, Busch A, Romanos M, Gamer M, Maria van Noort B, Beck A, Kappel V, Deserno L. Impaired flexible reward learning in ADHD patients is associated with blunted reinforcement sensitivity and neural signals in ventral striatum and parietal cortex. Neuroimage Clin 2024; 42:103588. [PMID: 38471434 PMCID: PMC10943992 DOI: 10.1016/j.nicl.2024.103588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 02/06/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024]
Abstract
Reward-based learning and decision-making are prime candidates to understand symptoms of attention deficit hyperactivity disorder (ADHD). However, only limited evidence is available regarding the neurocomputational underpinnings of the alterations seen in ADHD. This concerns flexible behavioral adaption in dynamically changing environments, which is challenging for individuals with ADHD. One previous study points to elevated choice switching in adolescent ADHD, which was accompanied by disrupted learning signals in medial prefrontal cortex. Here, we investigated young adults with ADHD (n = 17) as compared to age- and sex-matched controls (n = 17) using a probabilistic reversal learning experiment during functional magnetic resonance imaging (fMRI). The task requires continuous learning to guide flexible behavioral adaptation to changing reward contingencies. To disentangle the neurocomputational underpinnings of the behavioral data, we used reinforcement learning (RL) models, which informed the analysis of fMRI data. ADHD patients performed worse than controls particularly in trials before reversals, i.e., when reward contingencies were stable. This pattern resulted from 'noisy' choice switching regardless of previous feedback. RL modelling showed decreased reinforcement sensitivity and enhanced learning rates for negative feedback in ADHD patients. At the neural level, this was reflected in a diminished representation of choice probability in the left posterior parietal cortex in ADHD. Moreover, modelling showed a marginal reduction of learning about the unchosen option, which was paralleled by a marginal reduction in learning signals incorporating the unchosen option in the left ventral striatum. Taken together, we show that impaired flexible behavior in ADHD is due to excessive choice switching ('hyper-flexibility'), which can be detrimental or beneficial depending on the learning environment. Computationally, this resulted from blunted sensitivity to reinforcement of which we detected neural correlates in the attention-control network, specifically in the parietal cortex. These neurocomputational findings remain preliminary due to the relatively small sample size.
Collapse
Affiliation(s)
- Hans-Christoph Aster
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University Hospital Würzburg, Würzburg, Germany.
| | - Maria Waltmann
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University Hospital Würzburg, Würzburg, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Anika Busch
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University Hospital Würzburg, Würzburg, Germany
| | - Marcel Romanos
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University Hospital Würzburg, Würzburg, Germany
| | - Matthias Gamer
- Department of Psychology, University of Würzburg, Würzburg, Germany
| | - Betteke Maria van Noort
- Department of Child and Adolescent Psychiatry, Charité University Medicine, Campus Virchow Klinikum, Berlin, Germany; MSB Medical School Berlin, Department of Psychology, Germany
| | - Anne Beck
- Department of Psychiatry and Neurosciences, Charité University Medicine, Berlin, Germany; Department of Psychology, Faculty of Health, Health and Medical University, Potsdam, Germany
| | - Viola Kappel
- Department of Child and Adolescent Psychiatry, Charité University Medicine, Campus Virchow Klinikum, Berlin, Germany
| | - Lorenz Deserno
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University Hospital Würzburg, Würzburg, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany; Department of Psychiatry and Psychotherapy, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
4
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
5
|
Le T, Oba T, Couch L, McInerney L, Li CS. Deficits in proactive avoidance and neural responses to drinking motives in problem drinkers. RESEARCH SQUARE 2024:rs.3.rs-3924584. [PMID: 38405986 PMCID: PMC10889056 DOI: 10.21203/rs.3.rs-3924584/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Physical pain and negative emotions represent two distinct drinking motives that contribute to harmful alcohol use. Proactive avoidance which can reduce problem drinking in response to these motives appears to be impaired in problem drinkers. However, proactive avoidance and its underlying neural deficits have not been assessed experimentally. How these deficits inter-relate with drinking motives to influence alcohol use also remains unclear. The current study leveraged neuroimaging data collected in forty-one problem and forty-one social drinkers who performed a probabilistic learning go/nogo task that involved proactive avoidance of painful outcomes. We characterized the regional brain responses to proactive avoidance and identified the neural correlates of drinking to avoid physical pain and negative emotions. Behavioral results confirmed problem drinkers' proactive avoidance deficits in learning rate and performance accuracy, both which were associated with greater alcohol use. Imaging findings in problem drinkers showed that negative emotions as a drinking motive predicted attenuated right insula activation during proactive avoidance. In contrast, physical pain motive predicted reduced right putamen response. These regions' activations as well as functional connectivity with the somatomotor cortex also demonstrated a negative relationship with drinking severity and positive relationship with proactive avoidance performance. Path modeling further delineated the pathways through which physical pain and negative emotions, along with alcohol use severity, influenced the neural and behavioral measures of proactive avoidance. Taken together, the current findings provide experimental evidence for proactive avoidance deficits in problem drinkers and establish the link between their neural underpinnings and alcohol misuse.
Collapse
|
6
|
Zika O, Appel J, Klinge C, Shkreli L, Browning M, Wiech K, Reinecke A. Reduction of Aversive Learning Rates in Pavlovian Conditioning by Angiotensin II Antagonist Losartan: A Randomized Controlled Trial. Biol Psychiatry 2024:S0006-3223(24)00063-5. [PMID: 38309320 DOI: 10.1016/j.biopsych.2024.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 01/12/2024] [Accepted: 01/22/2024] [Indexed: 02/05/2024]
Abstract
BACKGROUND Angiotensin receptor blockade has been linked to aspects of aversive learning and memory formation and to the prevention of posttraumatic stress disorder symptom development. METHODS We investigated the influence of the angiotensin receptor blocker losartan on aversive Pavlovian conditioning using a probabilistic learning paradigm. In a double-blind, randomized, placebo-controlled design, we tested 45 (18 female) healthy volunteers during a baseline session, after application of losartan or placebo (drug session), and during a follow-up session. During each session, participants engaged in a task in which they had to predict the probability of an electrical stimulation on every trial while the true shock contingencies switched repeatedly between phases of high and low shock threat. Computational reinforcement learning models were used to investigate learning dynamics. RESULTS Acute administration of losartan significantly reduced participants' adjustment during both low-to-high and high-to-low threat changes. This was driven by reduced aversive learning rates in the losartan group during the drug session compared with baseline. The 50-mg drug dose did not induce reduction of blood pressure or change in reaction times, ruling out a general reduction in attention and engagement. Decreased adjustment of aversive expectations was maintained at a follow-up session 24 hours later. CONCLUSIONS This study shows that losartan acutely reduces Pavlovian learning in aversive environments, thereby highlighting a potential role of the renin-angiotensin system in anxiety development.
Collapse
Affiliation(s)
- Ondrej Zika
- Max Planck Institute for Human Development, Berlin, Germany
| | - Judith Appel
- Behavioural Science Institute, Radboud University Nijmegen, Nijmegen, the Netherlands
| | - Corinna Klinge
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Lorika Shkreli
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Michael Browning
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom; Oxford Health NHS Trust, Warneford Hospital, Oxford, United Kingdom
| | - Katja Wiech
- Wellcome Centre for Integrative Functional Neuroimaging, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Andrea Reinecke
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom; Oxford Health NHS Trust, Warneford Hospital, Oxford, United Kingdom.
| |
Collapse
|
7
|
Le TM, Oba T, Couch L, McInerney L, Li CSR. The Neural Correlates of Individual Differences in Reinforcement Learning during Pain Avoidance and Reward Seeking. eNeuro 2024; 11:ENEURO.0437-23.2024. [PMID: 38365840 PMCID: PMC10901196 DOI: 10.1523/eneuro.0437-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/31/2024] [Accepted: 02/05/2024] [Indexed: 02/18/2024] Open
Abstract
Organisms learn to gain reward and avoid punishment through action-outcome associations. Reinforcement learning (RL) offers a critical framework to understand individual differences in this associative learning by assessing learning rate, action bias, pavlovian factor (i.e., the extent to which action values are influenced by stimulus values), and subjective impact of outcomes (i.e., motivation to seek reward and avoid punishment). Nevertheless, how these individual-level metrics are represented in the brain remains unclear. The current study leveraged fMRI in healthy humans and a probabilistic learning go/no-go task to characterize the neural correlates involved in learning to seek reward and avoid pain. Behaviorally, participants showed a higher learning rate during pain avoidance relative to reward seeking. Additionally, the subjective impact of outcomes was greater for reward trials and associated with lower response randomness. Our imaging findings showed that individual differences in learning rate and performance accuracy during avoidance learning were positively associated with activities of the dorsal anterior cingulate cortex, midcingulate cortex, and postcentral gyrus. In contrast, the pavlovian factor was represented in the precentral gyrus and superior frontal gyrus (SFG) during pain avoidance and reward seeking, respectively. Individual variation of the subjective impact of outcomes was positively predicted by activation of the left posterior cingulate cortex. Finally, action bias was represented by the supplementary motor area (SMA) and pre-SMA whereas the SFG played a role in restraining this action tendency. Together, these findings highlight for the first time the neural substrates of individual differences in the computational processes during RL.
Collapse
Affiliation(s)
- Thang M Le
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut 06519
| | - Takeyuki Oba
- Human Informatics and Interaction Research Institute, the National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-8560, Japan
| | - Luke Couch
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut 06519
| | - Lauren McInerney
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut 06519
| | - Chiang-Shan R Li
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut 06519
- Department of Neuroscience, Yale University School of Medicine, New Haven, Connecticut 06520
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, Connecticut 06520
- Wu Tsai Institute, Yale University, New Haven, Connecticut 06510
| |
Collapse
|
8
|
Schaaf JV, Weidinger L, Molleman L, van den Bos W. Test-retest reliability of reinforcement learning parameters. Behav Res Methods 2023:10.3758/s13428-023-02203-4. [PMID: 37684495 DOI: 10.3758/s13428-023-02203-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 09/10/2023]
Abstract
It has recently been suggested that parameter estimates of computational models can be used to understand individual differences at the process level. One area of research in which this approach, called computational phenotyping, has taken hold is computational psychiatry. One requirement for successful computational phenotyping is that behavior and parameters are stable over time. Surprisingly, the test-retest reliability of behavior and model parameters remains unknown for most experimental tasks and models. The present study seeks to close this gap by investigating the test-retest reliability of canonical reinforcement learning models in the context of two often-used learning paradigms: a two-armed bandit and a reversal learning task. We tested independent cohorts for the two tasks (N = 69 and N = 47) via an online testing platform with a between-test interval of five weeks. Whereas reliability was high for personality and cognitive measures (with ICCs ranging from .67 to .93), it was generally poor for the parameter estimates of the reinforcement learning models (with ICCs ranging from .02 to .52 for the bandit task and from .01 to .71 for the reversal learning task). Given that simulations indicated that our procedures could detect high test-retest reliability, this suggests that a significant proportion of the variability must be ascribed to the participants themselves. In support of that hypothesis, we show that mood (stress and happiness) can partly explain within-participant variability. Taken together, these results are critical for current practices in computational phenotyping and suggest that individual variability should be taken into account in the future development of the field.
Collapse
Affiliation(s)
- Jessica V Schaaf
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands.
- Cognitive Neuroscience Department, Radboud University Medical Centre, Nijmegen, the Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands.
| | - Laura Weidinger
- DeepMind, London, United Kingdom
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Lucas Molleman
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| | - Wouter van den Bos
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
- Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
| |
Collapse
|
9
|
Garrett N, Sharot T. There is no belief update bias for neutral events: failure to replicate Burton et al. (2022). JOURNAL OF COGNITIVE PSYCHOLOGY 2023; 35:876-886. [PMID: 38013976 PMCID: PMC10591604 DOI: 10.1080/20445911.2023.2245112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 08/01/2023] [Indexed: 11/29/2023]
Abstract
In a recent paper, Burton et al. claim that individuals update beliefs to a greater extent when learning an event is less likely compared to more likely than expected. Here, we investigate Burton's et al.'s, findings. First, we show how Burton et al.'s data do not in fact support a belief update bias for neutral events. Next, in an attempt to replicate their findings, we collect a new data set employing the original belief update task design, but with neutral events. A belief update bias for neutral events is not observed. Finally, we highlight the statistical errors and confounds in Burton et al.'s design and analysis. This includes mis-specifying a reinforcement learning approach to model the data and failing to follow standard computational model fitting sanity checks such as parameter recovery, model comparison and out of sample prediction. Together, the results find little evidence for biased updating for neutral events.
Collapse
Affiliation(s)
- Neil Garrett
- School of Psychology, University of East Anglia, Norwich, UK
| | - Tali Sharot
- Affective Brain Lab, Department of Experimental Psychology, University College London, London, UK
- The Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
10
|
Vandendriessche H, Demmou A, Bavard S, Yadak J, Lemogne C, Mauras T, Palminteri S. Contextual influence of reinforcement learning performance of depression: evidence for a negativity bias? Psychol Med 2023; 53:4696-4706. [PMID: 35726513 DOI: 10.1017/s0033291722001593] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
BACKGROUNDS Value-based decision-making impairment in depression is a complex phenomenon: while some studies did find evidence of blunted reward learning and reward-related signals in the brain, others indicate no effect. Here we test whether such reward sensitivity deficits are dependent on the overall value of the decision problem. METHODS We used a two-armed bandit task with two different contexts: one 'rich', one 'poor' where both options were associated with an overall positive, negative expected value, respectively. We tested patients (N = 30) undergoing a major depressive episode and age, gender and socio-economically matched controls (N = 26). Learning performance followed by a transfer phase, without feedback, were analyzed to distangle between a decision or a value-update process mechanism. Finally, we used computational model simulation and fitting to link behavioral patterns to learning biases. RESULTS Control subjects showed similar learning performance in the 'rich' and the 'poor' contexts, while patients displayed reduced learning in the 'poor' context. Analysis of the transfer phase showed that the context-dependent impairment in patients generalized, suggesting that the effect of depression has to be traced to the outcome encoding. Computational model-based results showed that patients displayed a higher learning rate for negative compared to positive outcomes (the opposite was true in controls). CONCLUSIONS Our results illustrate that reinforcement learning performances in depression depend on the value of the context. We show that depressive patients have a specific trouble in contexts with an overall negative state value, which in our task is consistent with a negativity bias at the learning rates level.
Collapse
Affiliation(s)
- Henri Vandendriessche
- Laboratoire de Neurosciences Cognitives Computationnelles, INSERM U960, Paris, France
- Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Amel Demmou
- Unité Psychiatrie Adultes, Hôpital Cochin Port Royal, Paris, France
| | - Sophie Bavard
- Laboratoire de Neurosciences Cognitives Computationnelles, INSERM U960, Paris, France
- Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
- Department of Psychology, University of Hamburg, Hamburg, Germany
| | - Julien Yadak
- Unité Psychiatrie Adultes, Hôpital Cochin Port Royal, Paris, France
| | - Cédric Lemogne
- Université Paris Cité, INSERM U1266, Institute de Psychiatrie et Neurosciences de Paris, Paris, France
- Service de Psychiatrie de l'adulte, AP-HP, Hôpital Hôtel-Dieu, Paris, France
| | - Thomas Mauras
- Groupe Hospitalier Universitaire, GHU paris psychiatrie neurosciences, Paris, France
| | - Stefano Palminteri
- Laboratoire de Neurosciences Cognitives Computationnelles, INSERM U960, Paris, France
- Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
| |
Collapse
|
11
|
Harada T. Exploring the effects of risk-taking, exploitation, and exploration on divergent thinking under group dynamics. Front Psychol 2023; 13:1063525. [PMID: 36743628 PMCID: PMC9890061 DOI: 10.3389/fpsyg.2022.1063525] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 12/13/2022] [Indexed: 01/19/2023] Open
Abstract
This study examined the effects of risk-taking and exploitation/exploration trade-off on divergent thinking in individuals, dyads, and triads. We adopted a simple Q-learning model to estimate risk attitudes, exploitation, and exploration parameters. The results showed that risk-taking, exploitation, and exploration did not affect divergent thinking in dyads. Instead, loss aversion was negatively related to divergent thinking. In contrast, risk attitudes and the inverse temperature as a ratio between exploitation and exploration were significant but with contrasting effects in individuals and triads. For individuals, risk-taking, exploitation and loss aversion played a critical role in divergent thinking. For triads, risk aversion and exploration were significantly related to divergent thinking. However, the results also indicated that balancing risk with exploitation/exploration and loss aversion is critical in enhancing divergent thinking in individuals and triads when learning coherence emerges. These results could be interpreted consistently with related literature such as the odd-vs. even-numbered group dynamics, knowledge diversity in group creativity, and representational change theory in insight problem-solving.
Collapse
|
12
|
Villano WJ, Kraus NI, Reneau TR, Jaso BA, Otto AR, Heller AS. Individual differences in naturalistic learning link negative emotionality to the development of anxiety. SCIENCE ADVANCES 2023; 9:eadd2976. [PMID: 36598977 PMCID: PMC9812386 DOI: 10.1126/sciadv.add2976] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Organisms learn from prediction errors (PEs) to predict the future. Laboratory studies using small financial outcomes find that humans use PEs to update expectations and link individual differences in PE-based learning to internalizing disorders. Because of the low-stakes outcomes in most tasks, it is unclear whether PE learning emerges in naturalistic, high-stakes contexts and whether individual differences in PE learning predict psychopathology risk. Using experience sampling to assess 625 college students' expected exam grades, we found evidence of PE-based learning and a general tendency to discount negative PEs, an "optimism bias." However, individuals with elevated negative emotionality, a personality trait linked to the development of anxiety disorders, displayed a global pessimism and learning differences that impeded accurate expectations and predicted future anxiety symptoms. A sensitivity to PEs combined with an aversion to negative PEs may result in a pessimistic and inaccurate model of the world, leading to anxiety.
Collapse
Affiliation(s)
| | - Noah I. Kraus
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| | - Travis R. Reneau
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, MO, USA
| | - Brittany A. Jaso
- Center for Anxiety and Related Disorders, Boston University, Boston, MA, USA
| | - A. Ross Otto
- Department of Psychology, McGill University, Montreal, Canada
| | - Aaron S. Heller
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| |
Collapse
|
13
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
14
|
Zamfir E, Dayan P. Interactions between attributions and beliefs at trial-by-trial level: Evidence from a novel computer game task. PLoS Comput Biol 2022; 18:e1009920. [PMID: 36155635 PMCID: PMC9536582 DOI: 10.1371/journal.pcbi.1009920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 10/06/2022] [Accepted: 08/28/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring causes of the good and bad events that we experience is part of the process of building models of our own capabilities and of the world around us. Making such inferences can be difficult because of complex reciprocal relationships between attributions of the causes of particular events, and beliefs about the capabilities and skills that influence our role in bringing them about. Abnormal causal attributions have long been studied in connection with psychiatric disorders, notably depression and paranoia; however, the mechanisms behind attributional inferences and the way they can go awry are not fully understood. We administered a novel, challenging, game of skill to a substantial population of healthy online participants, and collected trial-by-trial time series of both their beliefs about skill and attributions about the causes of the success and failure of real experienced outcomes. We found reciprocal relationships that provide empirical confirmation of the attribution-self representation cycle theory. This highlights the dynamic nature of the processes involved in attribution, and validates a framework for developing and testing computational accounts of attribution-belief interactions. As part of interpreting our experiences, we spontaneously make causal attributions and use them to update our beliefs about the world, ourselves and others. This has long been a topic of interest, particularly within psychiatry. Some theories assume that people have stable “attributional styles”, others focus on the changing nature of attribution-making and on the relationships between attributions and one’s beliefs about the self, suggesting that the two are mutually connected. In this area of research, people have traditionally been asked to imagine themselves experiencing various significant life events and report on how they would interpret those, or have been exposed to artificial and highly simplified situations in the lab. In this work, we introduce a new task to study relationships between causal attributions and beliefs: repeatedly playing an engaging and relatively complex game of skill. We show that we can detect mutual influences between attributions and beliefs at the level of individual wins and losses. This has implications for how everyday successes and failures impact our beliefs about ourselves and our well-being. It also could help understand how our interpretations of negative experiences can spiral out of control, affecting our mental health.
Collapse
Affiliation(s)
- Elena Zamfir
- Department of Education, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | - Peter Dayan
- Max Plack Institute for Biological Cybernetics, Tuebingen, Germany
| |
Collapse
|
15
|
Banaie Boroujeni K, Sigona MK, Treuting RL, Manuel TJ, Caskey CF, Womelsdorf T. Anterior cingulate cortex causally supports flexible learning under motivationally challenging and cognitively demanding conditions. PLoS Biol 2022; 20:e3001785. [PMID: 36067198 PMCID: PMC9481162 DOI: 10.1371/journal.pbio.3001785] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 09/16/2022] [Accepted: 08/09/2022] [Indexed: 12/02/2022] Open
Abstract
Anterior cingulate cortex (ACC) and striatum (STR) contain neurons encoding not only the expected values of actions, but also the value of stimulus features irrespective of actions. Values about stimulus features in ACC or STR might contribute to adaptive behavior by guiding fixational information sampling and biasing choices toward relevant objects, but they might also have indirect motivational functions by enabling subjects to estimate the value of putting effort into choosing objects. Here, we tested these possibilities by modulating neuronal activity in ACC and STR of nonhuman primates using transcranial ultrasound stimulation while subjects learned the relevance of objects in situations with varying motivational and cognitive demands. Motivational demand was indexed by varying gains and losses during learning, while cognitive demand was varied by increasing the uncertainty about which object features could be relevant during learning. We found that ultrasound stimulation of the ACC, but not the STR, reduced learning efficiency and prolonged information sampling when the task required averting losses and motivational demands were high. Reduced learning efficiency was particularly evident at higher cognitive demands and when subjects experienced loss of already attained tokens. These results suggest that the ACC supports flexible learning of feature values when loss experiences impose a motivational challenge and when uncertainty about the relevance of objects is high. Taken together, these findings provide causal evidence that the ACC facilitates resource allocation and improves visual information sampling during adaptive behavior.
Collapse
Affiliation(s)
- Kianoush Banaie Boroujeni
- Department of Psychology, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail: (KBB); (TW)
| | - Michelle K. Sigona
- Vanderbilt University Institute of Imaging Science, Nashville, Tennessee, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Robert Louie Treuting
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Thomas J. Manuel
- Vanderbilt University Institute of Imaging Science, Nashville, Tennessee, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Charles F. Caskey
- Vanderbilt University Institute of Imaging Science, Nashville, Tennessee, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
- Vanderbilt University Medical Center Department of Radiology and Radiological Sciences, Nashville, Tennessee, United States of America
| | - Thilo Womelsdorf
- Department of Psychology, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail: (KBB); (TW)
| |
Collapse
|
16
|
Banaie Boroujeni K, Watson M, Womelsdorf T. Gains and Losses Affect Learning Differentially at Low and High Attentional Load. J Cogn Neurosci 2022; 34:1952-1971. [PMID: 35802604 PMCID: PMC9830784 DOI: 10.1162/jocn_a_01885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Prospective gains and losses influence cognitive processing, but it is unresolved how they modulate flexible learning in changing environments. The prospect of gains might enhance flexible learning through prioritized processing of reward-predicting stimuli, but it is unclear how far this learning benefit extends when task demands increase. Similarly, experiencing losses might facilitate learning when they trigger attentional reorienting away from loss-inducing stimuli, but losses may also impair learning by increasing motivational costs or when negative outcomes are overgeneralized. To clarify these divergent views, we tested how varying magnitudes of gains and losses affect the flexible learning of feature values in environments that varied attentional load by increasing the number of interfering object features. With this task design, we found that larger prospective gains improved learning efficacy and learning speed, but only when attentional load was low. In contrast, expecting losses impaired learning efficacy, and this impairment was larger at higher attentional load. These findings functionally dissociate the contributions of gains and losses on flexible learning, suggesting they operate via separate control mechanisms. One mechanism is triggered by experiencing loss and reduces the ability to reduce distractor interference, impairs assigning credit to specific loss-inducing features, and decreases efficient exploration during learning. The second mechanism is triggered by experiencing gains, which enhances prioritizing reward-predicting stimulus features as long as the interference of distracting features is limited. Taken together, these results support a rational theory of cognitive control during learning, suggesting that experiencing losses and experiencing distractor interference impose costs for learning.
Collapse
Affiliation(s)
| | - Marcus Watson
- Department of Biology, Centre for Vision Research, York University, 4700 Keele Street, Toronto, Ontario M6J 1P3, Canada
| | - Thilo Womelsdorf
- Department of Psychology, Vanderbilt University, Nashville, TN 37240.,Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240
| |
Collapse
|
17
|
Nussenbaum K, Velez JA, Washington BT, Hamling HE, Hartley CA. Flexibility in valenced reinforcement learning computations across development. Child Dev 2022; 93:1601-1615. [PMID: 35596654 PMCID: PMC9831067 DOI: 10.1111/cdev.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Optimal integration of positive and negative outcomes during learning varies depending on an environment's reward statistics. The present study investigated the extent to which children, adolescents, and adults (N = 142 8-25 year-olds, 55% female, 42% White, 31% Asian, 17% mixed race, and 8% Black; data collected in 2021) adapt their weighting of better-than-expected and worse-than-expected outcomes when learning from reinforcement. Participants made choices across two contexts: one in which weighting positive outcomes more heavily than negative outcomes led to better performance, and one in which the reverse was true. Reinforcement learning modeling revealed that across age, participants shifted their valence biases in accordance with environmental structure. Exploratory analyses revealed strengthening of context-dependent flexibility with increasing age.
Collapse
Affiliation(s)
| | | | | | | | - Catherine A. Hartley
- Corresponding Author: Catherine A. Hartley, Department of Psychology, New York University, 6 Washington Place, Room 871A, New York, NY, 10003.
| |
Collapse
|
18
|
Michely J, Eldar E, Erdman A, Martin IM, Dolan RJ. Serotonin modulates asymmetric learning from reward and punishment in healthy human volunteers. Commun Biol 2022; 5:812. [PMID: 35962142 PMCID: PMC9374781 DOI: 10.1038/s42003-022-03690-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 07/08/2022] [Indexed: 11/15/2022] Open
Abstract
Instrumental learning is driven by a history of outcome success and failure. Here, we examined the impact of serotonin on learning from positive and negative outcomes. Healthy human volunteers were assessed twice, once after acute (single-dose), and once after prolonged (week-long) daily administration of the SSRI citalopram or placebo. Using computational modelling, we show that prolonged boosting of serotonin enhances learning from punishment and reduces learning from reward. This valence-dependent learning asymmetry increases subjects’ tendency to avoid actions as a function of cumulative failure without leading to detrimental, or advantageous, outcomes. By contrast, no significant modulation of learning was observed following acute SSRI administration. However, differences between the effects of acute and prolonged administration were not significant. Overall, these findings may help explain how serotonergic agents impact on mood disorders. Two factors can drive learning: punishment of failures and reward of successes. Serotonin induces a valence-dependent learning asymmetry, as revealed by prolonged administering of SSRIs to healthy participants in a gambling task.
Collapse
Affiliation(s)
- Jochen Michely
- Department of Psychiatry and Neurosciences, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany. .,Berlin Institute of Health at Charité - Universitätsmedizin Berlin, BIH Charité Clinician Scientist Program, Berlin, Germany. .,Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK. .,Wellcome Centre for Human Neuroimaging, University College London, London, UK.
| | - Eran Eldar
- Psychology and Cognitive Sciences Departments, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Alon Erdman
- Psychology and Cognitive Sciences Departments, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ingrid M Martin
- Wellcome Centre for Human Neuroimaging, University College London, London, UK.,Institute of Cognitive Neuroscience, University College London, London, UK
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.,Wellcome Centre for Human Neuroimaging, University College London, London, UK
| |
Collapse
|
19
|
Louie K. Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput Biol 2022; 18:e1010350. [PMID: 35862443 PMCID: PMC9345478 DOI: 10.1371/journal.pcbi.1010350] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 08/02/2022] [Accepted: 07/01/2022] [Indexed: 11/18/2022] Open
Abstract
Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making. Reinforcement learning models are widely used to characterize reward-driven learning in biological and computational agents. Standard reinforcement learning models use linear value functions, despite strong empirical evidence that biological value representations are nonlinear functions of external rewards. Here, we examine the properties of a biologically-based nonlinear reinforcement learning algorithm employing the canonical divisive normalization function, a neural computation commonly found in sensory, cognitive, and reward coding. We show that this normalized reinforcement learning algorithm implements a simple but powerful control of how reward learning reflects relative gains and losses. This property explains diverse behavioral and neural phenomena, and suggests the importance of using biologically valid value functions in computational models of learning and decision-making.
Collapse
Affiliation(s)
- Kenway Louie
- Center for Neural Science, New York University, New York, United States of America
- Neuroscience Institute, New York University Grossman School of Medicine, New York, United States of America
- * E-mail:
| |
Collapse
|
20
|
Dahal R, MacLellan K, Vavrek D, Dyson BJ. Assessing behavioural profiles following neutral, positive and negative feedback. PLoS One 2022; 17:e0270475. [PMID: 35788745 PMCID: PMC9255737 DOI: 10.1371/journal.pone.0270475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 06/10/2022] [Indexed: 12/02/2022] Open
Abstract
Previous data suggest zero-value, neutral outcomes (draw) are subjectively assigned negative rather than positive valence. The combined observations of faster rather than slower reaction times, subsequent actions defined by shift rather than stay behaviour, reduced flexibility, and, larger rather than smaller deviations from optimal performance following draws all align with the consequences of explicitly negative outcomes such as losses. We further tested the relationships between neutral, positive and negative outcomes by manipulating value salience and observing their behavioural profiles. Despite speeded reaction times and a non-significant bias towards shift behaviour similar to losses when draws were assigned the value of 0 (Experiment 1), the degree of shift behaviour approached an approximation of optimal performance when the draw value was explicitly positive (+1). This was in contrast to when the draw value was explicitly negative (-1), which led to a significant increase in the degree of shift behaviour (Experiment 2). Similar modifications were absent when the same value manipulations were applied to win or lose trials (Experiment 3). Rather than viewing draws as neutral and valence-free outcomes, the processing cascade generated by draws produces a complex behavioural profile containing elements found in response to both explicitly positive and explicitly negative results.
Collapse
Affiliation(s)
| | | | | | - Benjamin James Dyson
- University of Alberta, Edmonton, Canada
- University of Sussex, Brighton, United Kingdom
- Toronto Metropolian University, Toronto, Canada
- * E-mail:
| |
Collapse
|
21
|
Palminteri S, Lebreton M. The computational roots of positivity and confirmation biases in reinforcement learning. Trends Cogn Sci 2022; 26:607-621. [PMID: 35662490 DOI: 10.1016/j.tics.2022.04.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 12/16/2022]
Abstract
Humans do not integrate new information objectively: outcomes carrying a positive affective value and evidence confirming one's own prior belief are overweighed. Until recently, theoretical and empirical accounts of the positivity and confirmation biases assumed them to be specific to 'high-level' belief updates. We present evidence against this account. Learning rates in reinforcement learning (RL) tasks, estimated across different contexts and species, generally present the same characteristic asymmetry, suggesting that belief and value updating processes share key computational principles and distortions. This bias generates over-optimistic expectations about the probability of making the right choices and, consequently, generates over-optimistic reward expectations. We discuss the normative and neurobiological roots of these RL biases and their position within the greater picture of behavioral decision-making theories.
Collapse
Affiliation(s)
- Stefano Palminteri
- Laboratoire de Neurosciences Cognitives et Computationnelles, Institut National de la Santé et Recherche Médicale, Paris, France; Département d'Études Cognitives, Ecole Normale Supérieure, Paris, France; Université de Recherche Paris Sciences et Lettres, Paris, France.
| | - Maël Lebreton
- Paris School of Economics, Paris, France; LabNIC, Department of Fundamental Neurosciences, University of Geneva, Geneva, Switzerland; Swiss Center for Affective Science, Geneva, Switzerland.
| |
Collapse
|
22
|
Dennison JB, Sazhin D, Smith DV. Decision neuroscience and neuroeconomics: Recent progress and ongoing challenges. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2022; 13:e1589. [PMID: 35137549 PMCID: PMC9124684 DOI: 10.1002/wcs.1589] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/28/2021] [Accepted: 12/21/2021] [Indexed: 01/10/2023]
Abstract
In the past decade, decision neuroscience and neuroeconomics have developed many new insights in the study of decision making. This review provides an overarching update on how the field has advanced in this time period. Although our initial review a decade ago outlined several theoretical, conceptual, methodological, empirical, and practical challenges, there has only been limited progress in resolving these challenges. We summarize significant trends in decision neuroscience through the lens of the challenges outlined for the field and review examples where the field has had significant, direct, and applicable impacts across economics and psychology. First, we review progress on topics including reward learning, explore-exploit decisions, risk and ambiguity, intertemporal choice, and valuation. Next, we assess the impacts of emotion, social rewards, and social context on decision making. Then, we follow up with how individual differences impact choices and new exciting developments in the prediction and neuroforecasting of future decisions. Finally, we consider how trends in decision-neuroscience research reflect progress toward resolving past challenges, discuss new and exciting applications of recent research, and identify new challenges for the field. This article is categorized under: Psychology > Reasoning and Decision Making Psychology > Emotion and Motivation.
Collapse
Affiliation(s)
- Jeffrey B Dennison
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| | - Daniel Sazhin
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| | - David V Smith
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
23
|
Eckstein MK, Master SL, Dahl RE, Wilbrecht L, Collins AG. Reinforcement learning and bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal. Dev Cogn Neurosci 2022; 55:101106. [PMID: 35537273 PMCID: PMC9108470 DOI: 10.1016/j.dcn.2022.101106] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/01/2022] [Accepted: 03/25/2022] [Indexed: 12/02/2022] Open
Abstract
During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated how performance changes across adolescent development in a stochastic, volatile reversal-learning task that uniquely taxes the balance of persistence and flexibility. In a sample of 291 participants aged 8–30, we found that in the mid-teen years, adolescents outperformed both younger and older participants. We developed two independent cognitive models, based on Reinforcement learning (RL) and Bayesian inference (BI). The RL parameter for learning from negative outcomes and the BI parameters specifying participants’ mental models were closest to optimal in mid-teen adolescents, suggesting a central role in adolescent cognitive processing. By contrast, persistence and noise parameters improved monotonically with age. We distilled the insights of RL and BI using principal component analysis and found that three shared components interacted to form the adolescent performance peak: adult-like behavioral quality, child-like time scales, and developmentally-unique processing of positive feedback. This research highlights adolescence as a neurodevelopmental window that can create performance advantages in volatile and uncertain environments. It also shows how detailed insights can be gleaned by using cognitive models in new ways.
Collapse
|
24
|
Pupil Correlates of Decision Variables in Mice Playing a Competitive Mixed-Strategy Game. eNeuro 2022; 9:ENEURO.0457-21.2022. [PMID: 35168951 PMCID: PMC8925722 DOI: 10.1523/eneuro.0457-21.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/21/2021] [Accepted: 01/02/2022] [Indexed: 01/29/2023] Open
Abstract
In a competitive game involving an animal and an opponent, the outcome is contingent on the choices of both players. To succeed, the animal must continually adapt to competitive pressure, or else risk being exploited and lose out on rewards. In this study, we demonstrate that head-fixed male mice can be trained to play the iterative competitive game "matching pennies" against a virtual computer opponent. We find that the animals' performance is well described by a hybrid computational model that includes Q-learning and choice kernels. Comparing between matching pennies and a non-competitive two-armed bandit task, we show that the tasks encourage animals to operate at different regimes of reinforcement learning. To understand the involvement of neuromodulatory mechanisms, we measure fluctuations in pupil size and use multiple linear regression to relate the trial-by-trial transient pupil responses to decision-related variables. The analysis reveals that pupil responses are modulated by observable variables, including choice and outcome, as well as latent variables for value updating, but not action selection. Collectively, these results establish a paradigm for studying competitive decision-making in head-fixed mice and provide insights into the role of arousal-linked neuromodulation in the decision process.
Collapse
|
25
|
Rosenbaum GM, Grassie HL, Hartley CA. Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory. eLife 2022; 11:e64620. [PMID: 35072624 PMCID: PMC8786311 DOI: 10.7554/elife.64620] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 12/24/2021] [Indexed: 12/12/2022] Open
Abstract
As individuals learn through trial and error, some are more influenced by good outcomes, while others weight bad outcomes more heavily. Such valence biases may also influence memory for past experiences. Here, we examined whether valence asymmetries in reinforcement learning change across adolescence, and whether individual learning asymmetries bias the content of subsequent memory. Participants ages 8-27 learned the values of 'point machines,' after which their memory for trial-unique images presented with choice outcomes was assessed. Relative to children and adults, adolescents overweighted worse-than-expected outcomes during learning. Individuals' valence biases modulated incidental memory, such that those who prioritized worse- (or better-) than-expected outcomes during learning were also more likely to remember images paired with these outcomes, an effect reproduced in an independent dataset. Collectively, these results highlight age-related changes in the computation of subjective value and demonstrate that a valence-asymmetric valuation process influences how information is prioritized in episodic memory.
Collapse
Affiliation(s)
- Gail M Rosenbaum
- Department of Psychology, New York UniversityNew YorkUnited States
| | - Hannah L Grassie
- Department of Psychology, New York UniversityNew YorkUnited States
| | - Catherine A Hartley
- Department of Psychology, New York UniversityNew YorkUnited States
- Center for Neural Science, New York UniversityNew YorkUnited States
| |
Collapse
|
26
|
Eckstein MK, Master SL, Xia L, Dahl RE, Wilbrecht L, Collins AGE. The interpretation of computational model parameters depends on the context. eLife 2022; 11:75474. [PMID: 36331872 PMCID: PMC9635876 DOI: 10.7554/elife.75474] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 09/09/2022] [Indexed: 11/06/2022] Open
Abstract
Reinforcement Learning (RL) models have revolutionized the cognitive and brain sciences, promising to explain behavior from simple conditioning to complex problem solving, to shed light on developmental and individual differences, and to anchor cognitive processes in specific brain mechanisms. However, the RL literature increasingly reveals contradictory results, which might cast doubt on these claims. We hypothesized that many contradictions arise from two commonly-held assumptions about computational model parameters that are actually often invalid: That parameters generalize between contexts (e.g. tasks, models) and that they capture interpretable (i.e. unique, distinctive) neurocognitive processes. To test this, we asked 291 participants aged 8–30 years to complete three learning tasks in one experimental session, and fitted RL models to each. We found that some parameters (exploration / decision noise) showed significant generalization: they followed similar developmental trajectories, and were reciprocally predictive between tasks. Still, generalization was significantly below the methodological ceiling. Furthermore, other parameters (learning rates, forgetting) did not show evidence of generalization, and sometimes even opposite developmental trajectories. Interpretability was low for all parameters. We conclude that the systematic study of context factors (e.g. reward stochasticity; task volatility) will be necessary to enhance the generalizability and interpretability of computational cognitive models.
Collapse
Affiliation(s)
| | - Sarah L Master
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Department of Psychology, New York UniversityNew YorkUnited States
| | - Liyu Xia
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Department of Mathematics, University of California, BerkeleyBerkeleyUnited States
| | - Ronald E Dahl
- Institute of Human Development, University of California, BerkeleyBerkeleyUnited States
| | - Linda Wilbrecht
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Helen Wills Neuroscience Institute, University of California, BerkeleyBerkeleyUnited States
| | - Anne GE Collins
- Department of Psychology, University of California, BerkeleyBerkeleyUnited States,Helen Wills Neuroscience Institute, University of California, BerkeleyBerkeleyUnited States
| |
Collapse
|
27
|
Lefebvre G, Summerfield C, Bogacz R. A Normative Account of Confirmation Bias During Reinforcement Learning. Neural Comput 2021; 34:307-337. [PMID: 34758486 DOI: 10.1162/neco_a_01455] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 07/26/2021] [Indexed: 11/04/2022]
Abstract
Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.
Collapse
Affiliation(s)
- Germain Lefebvre
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, U.K.
| | | | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, U.K.
| |
Collapse
|
28
|
Oba T, Katahira K, Ohira H. A learning mechanism shaping risk preferences and a preliminary test of its relationship with psychopathic traits. Sci Rep 2021; 11:20853. [PMID: 34675294 PMCID: PMC8531311 DOI: 10.1038/s41598-021-00358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 10/07/2021] [Indexed: 11/09/2022] Open
Abstract
People tend to avoid risk in the domain of gains but take risks in the domain of losses; this is called the reflection effect. Formal theories of decision-making have provided important perspectives on risk preferences, but how individuals acquire risk preferences through experiences remains unknown. In the present study, we used reinforcement learning (RL) models to examine the learning processes that can shape attitudes toward risk in both domains. In addition, relationships between learning parameters and personality traits were investigated. Fifty-one participants performed a learning task, and we examined learning parameters and risk preference in each domain. Our results revealed that an RL model that included a nonlinear subjective utility parameter and differential learning rates for positive and negative prediction errors exhibited better fit than other models and that these parameters independently predicted risk preferences and the reflection effect. Regarding personality traits, although the sample sizes may be too small to test personality traits, increased primary psychopathy scores could be linked with decreased learning rates for positive prediction error in loss conditions among participants who had low anxiety traits. The present findings not only contribute to understanding how decision-making in risky conditions is influenced by past experiences but also provide insights into certain psychiatric problems.
Collapse
Affiliation(s)
- Takeyuki Oba
- Department of Psychology, Graduate School of Environmental Studies, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan.
| | - Kentaro Katahira
- Department of Psychology, Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan
| | - Hideki Ohira
- Department of Psychology, Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8601, Japan
| |
Collapse
|
29
|
Stewardson HJ, Sambrook TD. Reward prediction error in the ERP following unconditioned aversive stimuli. Sci Rep 2021; 11:19912. [PMID: 34620955 PMCID: PMC8497484 DOI: 10.1038/s41598-021-99408-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/16/2021] [Indexed: 11/15/2022] Open
Abstract
Reinforcement learning in humans and other animals is driven by reward prediction errors: deviations between the amount of reward or punishment initially expected and that which is obtained. Temporal difference methods of reinforcement learning generate this reward prediction error at the earliest time at which a revision in reward or punishment likelihood is signalled, for example by a conditioned stimulus. Midbrain dopamine neurons, believed to compute reward prediction errors, generate this signal in response to both conditioned and unconditioned stimuli, as predicted by temporal difference learning. Electroencephalographic recordings of human participants have suggested that a component named the feedback-related negativity (FRN) is generated when this signal is carried to the cortex. If this is so, the FRN should be expected to respond equivalently to conditioned and unconditioned stimuli. However, very few studies have attempted to measure the FRN's response to unconditioned stimuli. The present study attempted to elicit the FRN in response to a primary aversive stimulus (electric shock) using a design that varied reward prediction error while holding physical intensity constant. The FRN was strongly elicited, but earlier and more transiently than typically seen, suggesting that it may incorporate other processes than the midbrain dopamine system.
Collapse
Affiliation(s)
- Harry J Stewardson
- School of Psychology, University of East Anglia, Norwich Business Park, NR4 7TJ, UK.
| | - Thomas D Sambrook
- School of Psychology, University of East Anglia, Norwich Business Park, NR4 7TJ, UK
| |
Collapse
|
30
|
Enkhtaivan E, Nishimura J, Ly C, Cochran AL. A Competition of Critics in Human Decision-Making. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2021; 5:81-101. [PMID: 38773993 PMCID: PMC11104313 DOI: 10.5334/cpsy.64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 07/19/2021] [Indexed: 11/20/2022]
Abstract
Recent experiments and theories of human decision-making suggest positive and negative errors are processed and encoded differently by serotonin and dopamine, with serotonin possibly serving to oppose dopamine and protect against risky decisions. We introduce a temporal difference (TD) model of human decision-making to account for these features. Our model involves two critics, an optimistic learning system and a pessimistic learning system, whose predictions are integrated in time to control how potential decisions compete to be selected. Our model predicts that human decision-making can be decomposed along two dimensions: the degree to which the individual is sensitive to (1) risk and (2) uncertainty. In addition, we demonstrate that the model can learn about the mean and standard deviation of rewards, and provide information about reaction time despite not modeling these variables directly. Lastly, we simulate a recent experiment to show how updates of the two learning systems could relate to dopamine and serotonin transients, thereby providing a mathematical formalism to serotonin's hypothesized role as an opponent to dopamine. This new model should be useful for future experiments on human decision-making.
Collapse
Affiliation(s)
| | - Joel Nishimura
- School of Mathematical and Natural Sciences, Arizona State University, Glendale, AZ, US
| | - Cheng Ly
- Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, VA, US
| | - Amy L. Cochran
- Department of Mathematics, University of Wisconsin, Madison, WI, US
- Department of Population Health Sciences, University of Wisconsin, Madison, WI, US
| |
Collapse
|
31
|
Xia L, Master SL, Eckstein MK, Baribault B, Dahl RE, Wilbrecht L, Collins AGE. Modeling changes in probabilistic reinforcement learning during adolescence. PLoS Comput Biol 2021; 17:e1008524. [PMID: 34197447 PMCID: PMC8279421 DOI: 10.1371/journal.pcbi.1008524] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 07/14/2021] [Accepted: 05/26/2021] [Indexed: 01/17/2023] Open
Abstract
In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants' performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.
Collapse
Affiliation(s)
- Liyu Xia
- Department of Mathematics, University of California Berkeley, Berkeley, California, United States of America
| | - Sarah L. Master
- Department of Psychology, New York University, New York, New York, United States of America
| | - Maria K. Eckstein
- Department of Psychology, University of California Berkeley, Berkeley, California, United States of America
| | - Beth Baribault
- Department of Psychology, University of California Berkeley, Berkeley, California, United States of America
| | - Ronald E. Dahl
- School of Public Health, University of California Berkeley, Berkeley, California, United States of America
| | - Linda Wilbrecht
- Department of Psychology, University of California Berkeley, Berkeley, California, United States of America
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, California, United States of America
| | - Anne Gabrielle Eva Collins
- Department of Psychology, University of California Berkeley, Berkeley, California, United States of America
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, California, United States of America
| |
Collapse
|
32
|
Harada T. Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach. PLoS One 2021; 16:e0252122. [PMID: 34138907 PMCID: PMC8211165 DOI: 10.1371/journal.pone.0252122] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/10/2021] [Indexed: 11/28/2022] Open
Abstract
Although it is considered that two heads are better than one, related studies argued that groups rarely outperform their best members. This study examined not only whether two heads are better than one but also whether three heads are better than two or one in the context of two-armed bandit problems where learning plays an instrumental role in achieving high performance. This research revealed that a U-shaped correlation exists between performance and group size. The performance was highest for either individuals or triads, but the lowest for dyads. Moreover, this study estimated learning properties and determined that high inverse temperature (exploitation) accounted for high performance. In particular, it was shown that group effects regarding the inverse temperatures in dyads did not generate higher values to surpass the averages of their two group members. In contrast, triads gave rise to higher values of the inverse temperatures than their averages of their individual group members. These results were consistent with our proposed hypothesis that learning coherence is likely to emerge in individuals and triads, but not in dyads, which in turn leads to higher performance. This hypothesis is based on the classical argument by Simmel stating that while dyads are likely to involve more emotion and generate greater variability, triads are the smallest structure which tends to constrain emotions, reduce individuality, and generate behavioral convergences or uniformity because of the ‘‘two against one” social pressures. As a result, three heads or one head were better than two in our study.
Collapse
Affiliation(s)
- Tsutomu Harada
- Graduate School of Business Administration, Kobe University, Kobe, Japan
- * E-mail:
| |
Collapse
|
33
|
Ohta H, Satori K, Takarada Y, Arake M, Ishizuka T, Morimoto Y, Takahashi T. The asymmetric learning rates of murine exploratory behavior in sparse reward environments. Neural Netw 2021; 143:218-229. [PMID: 34157646 DOI: 10.1016/j.neunet.2021.05.030] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 04/16/2021] [Accepted: 05/26/2021] [Indexed: 11/29/2022]
Abstract
Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.
Collapse
Affiliation(s)
- Hiroyuki Ohta
- Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan.
| | | | - Yu Takarada
- Tokyo Denki University, Saitama, 350-0394, Japan
| | - Masashi Arake
- Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan
| | - Toshiaki Ishizuka
- Department of Pharmacology, National Defense Medical College, Saitama, 359-8513, Japan
| | - Yuji Morimoto
- Department of Physiology, National Defense Medical College, Saitama, 359-8513, Japan
| | | |
Collapse
|
34
|
Gu Y, Liu T, Zhang X, Long Q, Hu N, Zhang Y, Chen A. The Event-Related Potentials Responding to Outcome Valence and Expectancy Violation during Feedback Processing. Cereb Cortex 2021; 31:1060-1076. [PMID: 32995836 DOI: 10.1093/cercor/bhaa274] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 08/25/2020] [Accepted: 08/25/2020] [Indexed: 11/15/2022] Open
Abstract
Feedback-related negativity (FRN) is believed to encode reward prediction error (RPE), a term describing whether the outcome is better or worse than expected. However, some studies suggest that it may reflect unsigned prediction error (UPE) instead. Some disagreement remains as to whether FRN is sensitive to the interaction of outcome valence and prediction error (PE) or merely responsive to the absolute size of PE. Moreover, few studies have compared FRN in appetitive and aversive domains to clarify the valence effect or examine PE's quantitative modulation. To investigate the impact of valence and parametrical PE on FRN, we varied the prediction and feedback magnitudes within a probabilistic learning task in valence (gain and loss domains, Experiment 1) and non-valence contexts (pure digits, Experiment 2). Experiment 3 was identical to Experiment 1 except that some blocks emphasized outcome valence, while others highlighted predictive accuracy. Experiments 1 and 2 revealed a UPE encoder; Experiment 3 found an RPE encoder when valence was emphasized and a UPE encoder when predictive accuracy was highlighted. In this investigation, we demonstrate that FRN is sensitive to outcome valence and expectancy violation, exhibiting a preferential response depending on the dimension that is emphasized.
Collapse
Affiliation(s)
- Yan Gu
- Key Laboratory of Cognition and Personality of Ministry of Education, National Demonstration Center for Experimental Psychology Education (Southwest University), Faculty of Psychology, Southwest University, Chongqing 400715, China
| | - Tianliang Liu
- Key Laboratory of Cognition and Personality of Ministry of Education, National Demonstration Center for Experimental Psychology Education (Southwest University), Faculty of Psychology, Southwest University, Chongqing 400715, China
| | - Xuemeng Zhang
- Key Laboratory of Cognition and Personality of Ministry of Education, National Demonstration Center for Experimental Psychology Education (Southwest University), Faculty of Psychology, Southwest University, Chongqing 400715, China
| | - Quanshan Long
- Key Laboratory of Cognition and Personality of Ministry of Education, National Demonstration Center for Experimental Psychology Education (Southwest University), Faculty of Psychology, Southwest University, Chongqing 400715, China
| | - Na Hu
- Key Laboratory of Cognition and Personality of Ministry of Education, National Demonstration Center for Experimental Psychology Education (Southwest University), Faculty of Psychology, Southwest University, Chongqing 400715, China
| | - Yi Zhang
- Center for Brain Imaging, School of Life Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China
| | - Antao Chen
- Key Laboratory of Cognition and Personality of Ministry of Education, National Demonstration Center for Experimental Psychology Education (Southwest University), Faculty of Psychology, Southwest University, Chongqing 400715, China
| |
Collapse
|
35
|
Information about action outcomes differentially affects learning from self-determined versus imposed choices. Nat Hum Behav 2020; 4:1067-1079. [PMID: 32747804 DOI: 10.1038/s41562-020-0919-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 06/26/2020] [Indexed: 11/08/2022]
Abstract
The valence of new information influences learning rates in humans: good news tends to receive more weight than bad news. We investigated this learning bias in four experiments, by systematically manipulating the source of required action (free versus forced choices), outcome contingencies (low versus high reward) and motor requirements (go versus no-go choices). Analysis of model-estimated learning rates showed that the confirmation bias in learning rates was specific to free choices, but was independent of outcome contingencies. The bias was also unaffected by the motor requirements, thus suggesting that it operates in the representational space of decisions, rather than motoric actions. Finally, model simulations revealed that learning rates estimated from the choice-confirmation model had the effect of maximizing performance across low- and high-reward environments. We therefore suggest that choice-confirmation bias may be adaptive for efficient learning of action-outcome contingencies, above and beyond fostering person-level dispositions such as self-esteem.
Collapse
|
36
|
Harada T. Learning From Success or Failure? - Positivity Biases Revisited. Front Psychol 2020; 11:1627. [PMID: 32848998 PMCID: PMC7396482 DOI: 10.3389/fpsyg.2020.01627] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 06/16/2020] [Indexed: 11/18/2022] Open
Abstract
The purpose of this study was to reexamine positivity learning biases through a Q learning computation model and relate them to behavioral characteristics of exploitation and exploration. It was found that while the positivity learning biases existed in the simple asymmetric Q learning model, they completely disappeared once the time-varying nature of learning rates was incorporated. In the time-varying model, learning rates depended on the magnitudes of success and failure. The corresponding positive and negative learning rates were related to high and low performance, respectively, indicating that successes and failures were accounted for by positive and negative learning rates. Moreover, these learning rates were related to both exploitation and exploration in somewhat balanced ways. In contrast, under the constant learning parameter model, positivity biases were associated only with exploration. Therefore, the results in the time-varying model are more intuitively appealing than the simple asymmetric model. However, the statistical tests indicated that participants eclectically selected between the asymmetric learning model and its time-varying version, a frequency of which differed across participants.
Collapse
Affiliation(s)
- Tsutomu Harada
- Graduate School of Business Administration, Kobe University, Kobe, Japan
| |
Collapse
|
37
|
Biased belief updating and suboptimal choice in foraging decisions. Nat Commun 2020; 11:3417. [PMID: 32647271 PMCID: PMC7347922 DOI: 10.1038/s41467-020-16964-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 05/27/2020] [Indexed: 11/08/2022] Open
Abstract
Deciding which options to engage, and which to forego, requires developing accurate beliefs about the overall distribution of prospects. Here we adapt a classic prey selection task from foraging theory to examine how individuals keep track of an environment’s reward rate and adjust choices in response to its fluctuations. Preference shifts were most pronounced when the environment improved compared to when it deteriorated. This is best explained by a trial-by-trial learning model in which participants estimate the reward rate with upward vs. downward changes controlled by separate learning rates. A failure to adjust expectations sufficiently when an environment becomes worse leads to suboptimal choices: options that are valuable given the environmental conditions are rejected in the false expectation that better options will materialize. These findings offer a previously unappreciated parallel in the serial choice setting of observations of asymmetric updating and resulting biased (often overoptimistic) estimates in other domains. In some types of decision-making, people must accept or forego an option without knowing what prospects might later be available. Here, the authors reveal how a key bias– asymmetric learning from negative versus positive outcomes – emerges in this type of decision.
Collapse
|
38
|
Metha JA, Brian ML, Oberrauch S, Barnes SA, Featherby TJ, Bossaerts P, Murawski C, Hoyer D, Jacobson LH. Separating Probability and Reversal Learning in a Novel Probabilistic Reversal Learning Task for Mice. Front Behav Neurosci 2020; 13:270. [PMID: 31998088 PMCID: PMC6962304 DOI: 10.3389/fnbeh.2019.00270] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 11/27/2019] [Indexed: 11/13/2022] Open
Abstract
The exploration/exploitation tradeoff – pursuing a known reward vs. sampling from lesser known options in the hope of finding a better payoff – is a fundamental aspect of learning and decision making. In humans, this has been studied using multi-armed bandit tasks. The same processes have also been studied using simplified probabilistic reversal learning (PRL) tasks with binary choices. Our investigations suggest that protocols previously used to explore PRL in mice may prove beyond their cognitive capacities, with animals performing at a no-better-than-chance level. We sought a novel probabilistic learning task to improve behavioral responding in mice, whilst allowing the investigation of the exploration/exploitation tradeoff in decision making. To achieve this, we developed a two-lever operant chamber task with levers corresponding to different probabilities (high/low) of receiving a saccharin reward, reversing the reward contingencies associated with levers once animals reached a threshold of 80% responding at the high rewarding lever. We found that, unlike in existing PRL tasks, mice are able to learn and behave near optimally with 80% high/20% low reward probabilities. Altering the reward contingencies towards equality showed that some mice displayed preference for the high rewarding lever with probabilities as close as 60% high/40% low. Additionally, we show that animal choice behavior can be effectively modelled using reinforcement learning (RL) models incorporating learning rates for positive and negative prediction error, a perseveration parameter, and a noise parameter. This new decision task, coupled with RL analyses, advances access to investigate the neuroscience of the exploration/exploitation tradeoff in decision making.
Collapse
Affiliation(s)
- Jeremy A Metha
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia.,Brain, Mind and Markets Laboratory, Department of Finance, Faculty of Business and Economics, The University of Melbourne, Parkville, VIC, Australia
| | - Maddison L Brian
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Sara Oberrauch
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
| | - Samuel A Barnes
- Department of Psychiatry, School of Medicine, University of California, San Diego, La Jolla, CA, United States
| | - Travis J Featherby
- Behavioral Core, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia
| | - Peter Bossaerts
- Brain, Mind and Markets Laboratory, Department of Finance, Faculty of Business and Economics, The University of Melbourne, Parkville, VIC, Australia
| | - Carsten Murawski
- Brain, Mind and Markets Laboratory, Department of Finance, Faculty of Business and Economics, The University of Melbourne, Parkville, VIC, Australia
| | - Daniel Hoyer
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia.,Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, United States
| | - Laura H Jacobson
- Sleep and Cognition, The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia.,Translational Neuroscience, Department of Pharmacology and Therapeutics, School of Biomedical Sciences, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
39
|
Nussenbaum K, Hartley CA. Reinforcement learning across development: What insights can we draw from a decade of research? Dev Cogn Neurosci 2019; 40:100733. [PMID: 31770715 PMCID: PMC6974916 DOI: 10.1016/j.dcn.2019.100733] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 10/24/2019] [Accepted: 11/04/2019] [Indexed: 01/02/2023] Open
Abstract
The past decade has seen the emergence of the use of reinforcement learning models to study developmental change in value-based learning. It is unclear, however, whether these computational modeling studies, which have employed a wide variety of tasks and model variants, have reached convergent conclusions. In this review, we examine whether the tuning of model parameters that govern different aspects of learning and decision-making processes vary consistently as a function of age, and what neurocognitive developmental changes may account for differences in these parameter estimates across development. We explore whether patterns of developmental change in these estimates are better described by differences in the extent to which individuals adapt their learning processes to the statistics of different environments, or by more static learning biases that emerge across varied contexts. We focus specifically on learning rates and inverse temperature parameter estimates, and find evidence that from childhood to adulthood, individuals become better at optimally weighting recent outcomes during learning across diverse contexts and less exploratory in their value-based decision-making. We provide recommendations for how these two possibilities - and potential alternative accounts - can be tested more directly to build a cohesive body of research that yields greater insight into the development of core learning processes.
Collapse
|
40
|
Dyson BJ, Musgrave C, Rowe C, Sandhur R. Behavioural and neural interactions between objective and subjective performance in a Matching Pennies game. Int J Psychophysiol 2019; 147:128-136. [PMID: 31730790 DOI: 10.1016/j.ijpsycho.2019.11.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 11/05/2019] [Accepted: 11/07/2019] [Indexed: 02/06/2023]
Abstract
To examine the behavioural and neural interactions between objective and subjective performance during competitive decision-making, participants completed a Matching Pennies game where win-rates were fixed within three conditions (win > lose, win = lose, win < lose) and outcomes were predicted at each trial. Using random behaviour as the hallmark of optimal performance, we observed item (heads), contingency (win-stay, lose-shift) and combinatorial (HH, HT, TH, TT) biases across all conditions. Higher-quality behaviour represented by a reduction in combinatorial bias was observed during high win-rate exposure. In contrast, over-optimism biases were observed only in conditions where win rates were equal to, or less than, loss rates. At a group level, a neural measure of outcome evaluation (feedback-related negativity; FRN) indexed the binary distinction between positive and negative outcome. At an individual level, increased belief in successful performance accentuated FRN amplitude differences between wins and losses. Taken together, the data suggest that objective experiences of, or, subjective beliefs in, the predominance of positive outcomes may be mutual attempts to self-regulate performance during competition. In this way, increased exposure to positive outcomes (real or imagined) may help to weight the output of the more diligent and analytic System 2, relative to the impulsive and intuitive System 1.
Collapse
Affiliation(s)
- Benjamin James Dyson
- University of Alberta, Canada; University of Sussex, UK; Ryerson University, Canada.
| | | | | | | |
Collapse
|
41
|
Oba T, Katahira K, Ohira H. The Effect of Reduced Learning Ability on Avoidance in Psychopathy: A Computational Approach. Front Psychol 2019; 10:2432. [PMID: 31736830 PMCID: PMC6838140 DOI: 10.3389/fpsyg.2019.02432] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 10/14/2019] [Indexed: 02/01/2023] Open
Abstract
Individuals with psychopathy often show deficits in learning, which often have negative consequences. Several theories have been proposed to explain psychopathic behaviors, but the learning mechanisms in psychopathy are still unclear. To clarify the learning anomalies in psychopathy, we fitted reinforcement learning (RL) models to behavioral data. We conducted two experiments to examine the effect of psychopathy as a group difference (Experiment 1) and as a continuum (Experiment 2). Forty-three undergraduates (in Experiment 1) and fifty-five undergraduate and graduate students (in Experiment 2) performed a go/no-go based learning task with accompanying rewards or punishments. Although we observed no differences in learning performance among the levels of psychopathic traits, the learning rate for the positive prediction error in the loss domain was lower for those with high-psychopathic trait than for those with low-psychopathic trait. This finding indicates that individuals with high-psychopathic traits update an action value less when they avoid a negative outcome. Our model can represent previous theories under a computational framework and provide a new perspective on impaired learning in psychopathy.
Collapse
Affiliation(s)
- Takeyuki Oba
- Department of Psychology, Graduate School of Environmental Studies, Nagoya University, Nagoya, Japan
| | - Kentaro Katahira
- Department of Psychology, Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Hideki Ohira
- Department of Psychology, Graduate School of Informatics, Nagoya University, Nagoya, Japan
| |
Collapse
|
42
|
Howlett JR, Huang H, Hysek CM, Paulus MP. The effect of single-dose methylphenidate on the rate of error-driven learning in healthy males: a randomized controlled trial. Psychopharmacology (Berl) 2017; 234:3353-3360. [PMID: 28864865 PMCID: PMC5886350 DOI: 10.1007/s00213-017-4723-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 08/14/2017] [Indexed: 12/30/2022]
Abstract
RATIONALE AND OBJECTIVES Norepinephrine mediates the adjustment of error-driven learning to match the rate of change of the environment, while phasic dopamine signals prediction errors. We tested the hypothesis that pharmacologic manipulation may modulate this process. METHODS We administered a single dose of methylphenidate, a norepinephrine/dopamine reuptake inhibitor, or placebo in double-blind randomized fashion to 20 healthy human males, who then performed a probabilistic learning task. Each subject was tested in two sessions, receiving methylphenidate in one session and placebo in the other, in randomized order. Task performance was quantified by the percentage of trials on which subjects chose the most likely option, while learning rate was measured using a computational model-based parameter as well as with a behavioral analogue of this parameter. RESULTS There was a substance-by-session interaction effect on behavioral learning rate and model-based learning rate, such that subjects receiving methylphenidate exhibited higher learning rates than those receiving placebo in session 1, with no difference observed in session 2, suggesting that subjects retained the increased learning rate across sessions. Higher behavioral learning rate was associated with both higher task performance and with the model-based learning rate. Higher learning rates were advantageous given the high rate of change on the task. Subjects receiving methylphenidate and placebo began the task in session 1 with a similar behavioral learning rate, but those receiving methylphenidate rapidly increased learning rate toward the optimal value, suggesting that methylphenidate accelerated the adaptation of learning rate based on the environment. CONCLUSIONS The results suggest that methylphenidate may improve disrupted probabilistic learning in disorders involving noradrenergic or dopaminergic dysfunction.
Collapse
Affiliation(s)
- Jonathon R. Howlett
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA,+1 (858) 822-4357,
| | - He Huang
- Laureate Institute for Brain Research, Tulsa, OK, 74136, USA
| | - Cédric M. Hysek
- Department of Psychiatry, University of California San Diego, La Jolla, CA, 92093, USA
| | | |
Collapse
|
43
|
Heijne A, Rossi F, Sanfey AG. Why we stay with our social partners: Neural mechanisms of stay/leave decision-making. Soc Neurosci 2017; 13:667-679. [PMID: 28820016 DOI: 10.1080/17470919.2017.1370010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
How do we decide to keep interacting (e.g., stay) with a social partner or to switch (e.g., leave) to another? This paper investigated the neural mechanisms of stay/leave decision-making. We hypothesized that these decisions fit within a framework of value-based decision-making, and explored four potential mechanisms underlying a hypothesized bias to stay. Twenty-six participants underwent functional Magnetic Resonance Imaging (fMRI) while completing social and nonsocial versions of a stay/leave decision-making task. On each trial, participants chose between four alternative options, after which they received a monetary reward. Crucially, in the social condition, reward magnitude was ostensibly determined by the generosity of social partners, whereas in the nonsocial condition, reward amounts were ostensibly determined in a pre-programmed manner. Results demonstrated that participants were more likely to stay with options of relatively high expected value, with these values updated through Reinforcement Learning mechanisms and represented neurally within ventromedial prefrontal cortex. Moreover, we demonstrated that greater brain activity in ventromedial prefrontal cortex, caudate nucleus, and septo-hypothalamic regions for social versus nonsocial decisions to stay may underlie a bias towards staying with social partners in particular. These findings complement existing social psychological theories by investigating the neural mechanisms of actual stay/leave decisions.
Collapse
Affiliation(s)
- Amber Heijne
- a Donders Centre for Cognitive Neuroimaging , Radboud University , Nijmegen , the Netherlands.,b Department of Cognitive Science and Education , University of Trento , Rovereto , Italy
| | - Filippo Rossi
- c Institute for Neural Computation , University of California , San Diego , CA , USA
| | - Alan G Sanfey
- a Donders Centre for Cognitive Neuroimaging , Radboud University , Nijmegen , the Netherlands
| |
Collapse
|
44
|
Palminteri S, Lefebvre G, Kilford EJ, Blakemore SJ. Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing. PLoS Comput Biol 2017; 13:e1005684. [PMID: 28800597 PMCID: PMC5568446 DOI: 10.1371/journal.pcbi.1005684] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Revised: 08/23/2017] [Accepted: 07/14/2017] [Indexed: 11/18/2022] Open
Abstract
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice. While the investigation of decision-making biases has a long history in economics and psychology, learning biases have been much less systematically investigated. This is surprising as most of the choices we deal with in everyday life are recurrent, thus allowing learning to occur and therefore influencing future decision-making. Combining behavioural testing and computational modeling, here we show that the valence of an outcome biases both factual and counterfactual learning. When considering factual and counterfactual learning together, it appears that people tend to preferentially take into account information that confirms their current choice. Increasing our understanding of learning biases will enable the refinement of existing models of value-based decision-making.
Collapse
Affiliation(s)
- Stefano Palminteri
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
- Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, Paris, FR
- Departement d’Études Cognitives, École Normale Supérieure, Paris, FR
- Institut d’Études de la Cognition, Université de Recherche Paris Sciences et Lettres, Paris, FR
- * E-mail:
| | - Germain Lefebvre
- Laboratoire de Neurosciences Cognitives, Institut National de la Santé et de la Recherche Médicale, Paris, FR
- Departement d’Études Cognitives, École Normale Supérieure, Paris, FR
- Laboratoire d’Économie Mathématique et de Microéconomie Appliquée, Université Panthéon-Assas, Paris, FR
| | - Emma J. Kilford
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| | - Sarah-Jayne Blakemore
- Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| |
Collapse
|
45
|
|
46
|
Abstract
Studies of reinforcement learning have shown that humans learn differently in response to positive and negative reward prediction errors, a phenomenon that can be captured computationally by positing asymmetric learning rates. This asymmetry, motivated by neurobiological and cognitive considerations, has been invoked to explain learning differences across the lifespan as well as a range of psychiatric disorders. Recent theoretical work, motivated by normative considerations, has hypothesized that the learning rate asymmetry should be modulated by the distribution of rewards across the available options. In particular, the learning rate for negative prediction errors should be higher than the learning rate for positive prediction errors when the average reward rate is high, and this relationship should reverse when the reward rate is low. We tested this hypothesis in a series of experiments. Contrary to the theoretical predictions, we found that the asymmetry was largely insensitive to the average reward rate; instead, the dominant pattern was a higher learning rate for negative than for positive prediction errors, possibly reflecting risk aversion.
Collapse
|
47
|
Adolescent-specific patterns of behavior and neural activity during social reinforcement learning. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2015; 14:683-97. [PMID: 24550063 DOI: 10.3758/s13415-014-0257-z] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Collapse
|
48
|
Antipsychotic dose modulates behavioral and neural responses to feedback during reinforcement learning in schizophrenia. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2014; 14:189-201. [DOI: 10.3758/s13415-014-0261-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|