1
|
Fang Z, Sims CR. Humans learn generalizable representations through efficient coding. Nat Commun 2025; 16:3989. [PMID: 40295498 PMCID: PMC12037794 DOI: 10.1038/s41467-025-58848-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/01/2025] [Indexed: 04/30/2025] Open
Abstract
Reinforcement learning theory explains human behavior as driven by the goal of maximizing reward. Conventional approaches, however, offer limited insights into how people generalize from past experiences to new situations. Here, we propose refining the classical reinforcement learning framework by incorporating an efficient coding principle, which emphasizes maximizing reward using the simplest necessary representations. This refined framework predicts that intelligent agents, constrained by simpler representations, will inevitably: 1) distill environmental stimuli into fewer, abstract internal states, and 2) detect and utilize rewarding environmental features. Consequently, complex stimuli are mapped to compact representations, forming the foundation for generalization. We tested this idea in two experiments that examined human generalization. Our findings reveal that while conventional models fall short in generalization, models incorporating efficient coding achieve human-level performance. We argue that the classical RL objective, augmented with efficient coding, represents a more comprehensive computational framework for understanding human behavior in both learning and generalization.
Collapse
Affiliation(s)
- Zeming Fang
- Brain Health Institute, National Center for Mental Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine and School of Psychology, Shanghai, 200030, China.
- Key Laboratory of Brain-Machine Intelligence for Information Behavior-Ministry of Education, Shanghai International Studies University, Shanghai, China.
| | - Chris R Sims
- Department of Cognitive Science, Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
2
|
Bein O, Niv Y. Schemas, reinforcement learning and the medial prefrontal cortex. Nat Rev Neurosci 2025; 26:141-157. [PMID: 39775183 DOI: 10.1038/s41583-024-00893-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/03/2024] [Indexed: 01/11/2025]
Abstract
Schemas are rich and complex knowledge structures about the typical unfolding of events in a context; for example, a schema of a dinner at a restaurant. In this Perspective, we suggest that reinforcement learning (RL), a computational theory of learning the structure of the world and relevant goal-oriented behaviour, underlies schema learning. We synthesize literature about schemas and RL to offer that three RL principles might govern the learning of schemas: learning via prediction errors, constructing hierarchical knowledge using hierarchical RL, and dimensionality reduction through learning a simplified and abstract representation of the world. We then suggest that the orbitomedial prefrontal cortex is involved in both schemas and RL due to its involvement in dimensionality reduction and in guiding memory reactivation through interactions with posterior brain regions. Last, we hypothesize that the amount of dimensionality reduction might underlie gradients of involvement along the ventral-dorsal and posterior-anterior axes of the orbitomedial prefrontal cortex. More specific and detailed representations might engage the ventral and posterior parts, whereas abstraction might shift representations towards the dorsal and anterior parts of the medial prefrontal cortex.
Collapse
Affiliation(s)
- Oded Bein
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.
- Weill Cornell Institute of Geriatric Psychiatry, Department of Psychiatry, Weill Cornell Medicine, New York, NY, USA.
| | - Yael Niv
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
- Psychology Department, Princeton University, Princeton, NJ, USA
| |
Collapse
|
3
|
Wurm F, van der Ham IJM, Schomaker J. The ins and outs of unpacking the black box: Understanding motivation using a multi-level approach. Behav Brain Sci 2025; 48:e49. [PMID: 39886896 DOI: 10.1017/s0140525x24000566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2025]
Abstract
Although higher-level constructs often fail to explain the mechanisms underlying motivation, we argue that purely mechanistic approaches have limitations. Lower-level neural data help us identify "biologically plausible" mechanisms, while higher-level constructs are critical to formulate measurable behavioral outcomes when constructing computational models. Therefore, we propose that a multi-level, multi-measure approach is required to fully unpack the black box of motivated behavior.
Collapse
Affiliation(s)
- F Wurm
- Health, Medical & Neuropsychology, Leiden University, Leiden, The Netherlands ://www.universiteitleiden.nl/en/staffmembers/franz-wurm#tab-1https://www.universiteitleiden.nl/en/staffmembers/ineke-van-der-ham#tab-1https://www.universiteitleiden.nl/en/staffmembers/judith-schomaker/publications#tab-1
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| | - I J M van der Ham
- Health, Medical & Neuropsychology, Leiden University, Leiden, The Netherlands ://www.universiteitleiden.nl/en/staffmembers/franz-wurm#tab-1https://www.universiteitleiden.nl/en/staffmembers/ineke-van-der-ham#tab-1https://www.universiteitleiden.nl/en/staffmembers/judith-schomaker/publications#tab-1
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| | - J Schomaker
- Health, Medical & Neuropsychology, Leiden University, Leiden, The Netherlands ://www.universiteitleiden.nl/en/staffmembers/franz-wurm#tab-1https://www.universiteitleiden.nl/en/staffmembers/ineke-van-der-ham#tab-1https://www.universiteitleiden.nl/en/staffmembers/judith-schomaker/publications#tab-1
- Leiden Institute for Brain and Cognition, Leiden, The Netherlands
| |
Collapse
|
4
|
Lamba A, Frank MJ, FeldmanHall O. Keeping an Eye Out for Change: Anxiety Disrupts Adaptive Resolution of Policy Uncertainty. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2024; 9:1188-1198. [PMID: 39069235 DOI: 10.1016/j.bpsc.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 07/17/2024] [Accepted: 07/17/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND Human learning unfolds under uncertainty. Uncertainty is heterogeneous with different forms exerting distinct influences on learning. While one can be uncertain about what to do to maximize rewarding outcomes, known as policy uncertainty, one can also be uncertain about general world knowledge, known as epistemic uncertainty (EU). In complex and naturalistic environments such as the social world, adaptive learning may hinge on striking a balance between attending to and resolving each type of uncertainty. Prior work illustrates that people with anxiety-those with increased threat and uncertainty sensitivity-learn less from aversive outcomes, particularly as outcomes become more uncertain. How does a learner adaptively trade-off between attending to these distinct sources of uncertainty to successfully learn about their social environment? METHODS We developed a novel eye-tracking method to capture highly granular estimates of policy uncertainty and EU based on gaze patterns and pupil diameter (a physiological estimate of arousal). RESULTS These empirically derived uncertainty measures revealed that humans (N = 94) flexibly switched between resolving policy uncertainty and EU to adaptively learn about which individuals can be trusted and which should be avoided. However, those with increased anxiety (n = 49) did not flexibly switch between resolving policy uncertainty and EU and instead expressed less uncertainty overall. CONCLUSIONS Combining modeling and eye-tracking techniques, we show that altered learning in people with anxiety emerged from an insensitivity to policy uncertainty and rigid choice policies, leading to maladaptive behaviors with untrustworthy people.
Collapse
Affiliation(s)
- Amrita Lamba
- Department of Cognitive and Psychological Sciences, Brown University, Providence, Rhode Island; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Michael J Frank
- Department of Cognitive and Psychological Sciences, Brown University, Providence, Rhode Island; Carney Institute of Brain Sciences, Brown University, Providence, Rhode Island
| | - Oriel FeldmanHall
- Department of Cognitive and Psychological Sciences, Brown University, Providence, Rhode Island; Carney Institute of Brain Sciences, Brown University, Providence, Rhode Island.
| |
Collapse
|
5
|
Brown VM, Lee J, Wang J, Casas B, Chiu PH. Reinforcement-Learning-Informed Queries Guide Behavioral Change. Clin Psychol Sci 2024; 12:1146-1161. [PMID: 39635456 PMCID: PMC11617014 DOI: 10.1177/21677026231213368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
Algorithmically defined aspects of reinforcement learning correlate with psychopathology symptoms and change with symptom improvement following cognitive-behavioral therapy (CBT). Separate work in nonclinical samples has shown that varying the structure and statistics of task environments can change learning. Here, we combine these literatures, drawing on CBT-based guided restructuring of thought processes and computationally defined mechanistic targets identified by reinforcement-learning models in depression, to test whether and how verbal queries affect learning processes. Using a parallel-arm design, we tested 1,299 online participants completing a probabilistic reward-learning task while receiving repeated queries about the task environment (11 learning-query arms and one active control arm). Querying participants about reinforcement-learning-related task components altered computational-model-defined learning parameters in directions specific to the target of the query. These effects on learning parameters were consistent across depression-symptom severity, suggesting new learning-based strategies and therapeutic targets for evoking symptom change in mood psychopathology.
Collapse
Affiliation(s)
- Vanessa M. Brown
- Fralin Biomedical Research Institute at VTC, Virginia Tech
- Department of Psychology, Virginia Tech
- Department of Psychiatry, University of Pittsburgh
- Department of Psychology, Emory University
| | - Jacob Lee
- Fralin Biomedical Research Institute at VTC, Virginia Tech
| | - John Wang
- Fralin Biomedical Research Institute at VTC, Virginia Tech
- Department of Psychology, Virginia Tech
| | - Brooks Casas
- Fralin Biomedical Research Institute at VTC, Virginia Tech
- Department of Psychology, Virginia Tech
| | - Pearl H. Chiu
- Fralin Biomedical Research Institute at VTC, Virginia Tech
- Department of Psychology, Virginia Tech
| |
Collapse
|
6
|
Heijnen S, Sleutels J, de Kleijn R. Model Virtues in Computational Cognitive Neuroscience. J Cogn Neurosci 2024; 36:1683-1694. [PMID: 38739562 DOI: 10.1162/jocn_a_02183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
There is an abundance of computational models in cognitive neuroscience. A framework for what is desirable in a model, what justifies the introduction of a new one, or what makes one better than another is lacking, however. In this article, we examine key qualities ("virtues") that are desirable in computational models, and how these are interrelated. To keep the scope of the article manageable, we focus on the field of cognitive control, where we identified six "model virtues": empirical accuracy, empirical scope, functional analysis, causal detail, biological plausibility, and psychological plausibility. We first illustrate their use in published work on Stroop modeling and then discuss what expert modelers in the field of cognitive control said about them in a series of qualitative interviews. We found that virtues are interrelated and that their value depends on the modeler's goals, in ways that are not typically acknowledged in the literature. We recommend that researchers make the reasons for their modeling choices more explicit in published work. Our work is meant as a first step. Although our focus here is on cognitive control, we hope that our findings will spark discussion of virtues in other fields as well.
Collapse
|
7
|
Wärnberg E, Kumar A. Feasibility of dopamine as a vector-valued feedback signal in the basal ganglia. Proc Natl Acad Sci U S A 2023; 120:e2221994120. [PMID: 37527344 PMCID: PMC10410740 DOI: 10.1073/pnas.2221994120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/08/2023] [Indexed: 08/03/2023] Open
Abstract
It is well established that midbrain dopaminergic neurons support reinforcement learning (RL) in the basal ganglia by transmitting a reward prediction error (RPE) to the striatum. In particular, different computational models and experiments have shown that a striatum-wide RPE signal can support RL over a small discrete set of actions (e.g., no/no-go, choose left/right). However, there is accumulating evidence that the basal ganglia functions not as a selector between predefined actions but rather as a dynamical system with graded, continuous outputs. To reconcile this view with RL, there is a need to explain how dopamine could support learning of continuous outputs, rather than discrete action values. Inspired by the recent observations that besides RPE, the firing rates of midbrain dopaminergic neurons correlate with motor and cognitive variables, we propose a model in which dopamine signal in the striatum carries a vector-valued error feedback signal (a loss gradient) instead of a homogeneous scalar error (a loss). We implement a local, "three-factor" corticostriatal plasticity rule involving the presynaptic firing rate, a postsynaptic factor, and the unique dopamine concentration perceived by each striatal neuron. With this learning rule, we show that such a vector-valued feedback signal results in an increased capacity to learn a multidimensional series of real-valued outputs. Crucially, we demonstrate that this plasticity rule does not require precise nigrostriatal synapses but remains compatible with experimental observations of random placement of varicosities and diffuse volume transmission of dopamine.
Collapse
Affiliation(s)
- Emil Wärnberg
- Department of Neuroscience, Karolinska Institutet, 171 77Stockholm, Sweden
- Division of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28Stockholm, Sweden
| | - Arvind Kumar
- Division of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28Stockholm, Sweden
| |
Collapse
|
8
|
Rosenblau G, Frolichs K, Korn CW. A neuro-computational social learning framework to facilitate transdiagnostic classification and treatment across psychiatric disorders. Neurosci Biobehav Rev 2023; 149:105181. [PMID: 37062494 PMCID: PMC10236440 DOI: 10.1016/j.neubiorev.2023.105181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 03/14/2023] [Accepted: 04/13/2023] [Indexed: 04/18/2023]
Abstract
Social deficits are among the core and most striking psychiatric symptoms, present in most psychiatric disorders. Here, we introduce a novel social learning framework, which consists of neuro-computational models that combine reinforcement learning with various types of social knowledge structures. We outline how this social learning framework can help specify and quantify social psychopathology across disorders and provide an overview of the brain regions that may be involved in this type of social learning. We highlight how this framework can specify commonalities and differences in the social psychopathology of individuals with autism spectrum disorder (ASD), personality disorders (PD), and major depressive disorder (MDD) and improve treatments on an individual basis. We conjecture that individuals with psychiatric disorders rely on rigid social knowledge representations when learning about others, albeit the nature of their rigidity and the behavioral consequences can greatly differ. While non-clinical cohorts tend to efficiently adapt social knowledge representations to relevant environmental constraints, psychiatric cohorts may rigidly stick to their preconceived notions or overly coarse knowledge representations during learning.
Collapse
Affiliation(s)
- Gabriela Rosenblau
- Department of Psychological and Brain Sciences, George Washington University, Washington DC, USA; Autism and Neurodevelopmental Disorders Institute, George Washington University, Washington DC, USA.
| | - Koen Frolichs
- Section Social Neuroscience, Department of General Psychiatry, University of Heidelberg, Heidelberg, Germany; Institute for Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Christoph W Korn
- Section Social Neuroscience, Department of General Psychiatry, University of Heidelberg, Heidelberg, Germany; Institute for Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| |
Collapse
|
9
|
Sherif MA, Fotros A, Greenberg BD, McLaughlin NCR. Understanding cingulotomy's therapeutic effect in OCD through computer models. Front Integr Neurosci 2023; 16:889831. [PMID: 36704759 PMCID: PMC9871832 DOI: 10.3389/fnint.2022.889831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 12/05/2022] [Indexed: 01/12/2023] Open
Abstract
Cingulotomy is therapeutic in OCD, but what are the possible mechanisms? Computer models that formalize cortical OCD abnormalities and anterior cingulate cortex (ACC) function can help answer this. At the neural dynamics level, cortical dynamics in OCD have been modeled using attractor networks, where activity patterns resistant to change denote the inability to switch to new patterns, which can reflect inflexible thinking patterns or behaviors. From that perspective, cingulotomy might reduce the influence of difficult-to-escape ACC attractor dynamics on other cortical areas. At the functional level, computer formulations based on model-free reinforcement learning (RL) have been used to describe the multitude of phenomena ACC is involved in, such as tracking the timing of expected outcomes and estimating the cost of exerting cognitive control and effort. Different elements of model-free RL models of ACC could be affected by the inflexible cortical dynamics, making it challenging to update their values. An agent can also use a world model, a representation of how the states of the world change, to plan its actions, through model-based RL. OCD has been hypothesized to be driven by reduced certainty of how the brain's world model describes changes. Cingulotomy might improve such uncertainties about the world and one's actions, making it possible to trust the outcomes of these actions more and thus reduce the urge to collect more sensory information in the form of compulsions. Connecting the neural dynamics models with the functional formulations can provide new ways of understanding the role of ACC in OCD, with potential therapeutic insights.
Collapse
Affiliation(s)
- Mohamed A. Sherif
- Department of Psychiatry, Brown University, Providence, RI, United States
- Carney Institute for Brain Science, Brown University, Providence, RI, United States
- Department of Psychiatry Lifespan Health System, Providence, RI, United States
| | - Aryandokht Fotros
- Department of Psychiatry, Brown University, Providence, RI, United States
- Department of Psychiatry Lifespan Health System, Providence, RI, United States
| | - Benjamin D. Greenberg
- Department of Psychiatry, Brown University, Providence, RI, United States
- Carney Institute for Brain Science, Brown University, Providence, RI, United States
- Butler Hospital, Providence, RI, United States
- United States Department of Veterans Affairs, Providence VA Medical Center, Providence, RI, United States
| | - Nicole C. R. McLaughlin
- Department of Psychiatry, Brown University, Providence, RI, United States
- Carney Institute for Brain Science, Brown University, Providence, RI, United States
- Butler Hospital, Providence, RI, United States
| |
Collapse
|
10
|
Incorporating social knowledge structures into computational models. Nat Commun 2022; 13:6205. [PMID: 36266284 PMCID: PMC9584930 DOI: 10.1038/s41467-022-33418-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 09/16/2022] [Indexed: 12/24/2022] Open
Abstract
To navigate social interactions successfully, humans need to continuously learn about the personality traits of other people (e.g., how helpful or aggressive is the other person?). However, formal models that capture the complexities of social learning processes are currently lacking. In this study, we specify and test potential strategies that humans can employ for learning about others. Standard Rescorla-Wagner (RW) learning models only capture parts of the learning process because they neglect inherent knowledge structures and omit previously acquired knowledge. We therefore formalize two social knowledge structures and implement them in hybrid RW models to test their usefulness across multiple social learning tasks. We name these concepts granularity (knowledge structures about personality traits that can be utilized at different levels of detail during learning) and reference points (previous knowledge formalized into representations of average people within a social group). In five behavioural experiments, results from model comparisons and statistical analyses indicate that participants efficiently combine the concepts of granularity and reference points-with the specific combinations in models depending on the people and traits that participants learned about. Overall, our experiments demonstrate that variants of RW algorithms, which incorporate social knowledge structures, describe crucial aspects of the dynamics at play when people interact with each other.
Collapse
|
11
|
Efficient coding of cognitive variables underlies dopamine response and choice behavior. Nat Neurosci 2022; 25:738-748. [PMID: 35668173 DOI: 10.1038/s41593-022-01085-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 04/26/2022] [Indexed: 11/26/2022]
Abstract
Reward expectations based on internal knowledge of the external environment are a core component of adaptive behavior. However, internal knowledge may be inaccurate or incomplete due to errors in sensory measurements. Some features of the environment may also be encoded inaccurately to minimize representational costs associated with their processing. In this study, we investigated how reward expectations are affected by features of internal representations by studying behavior and dopaminergic activity while mice make time-based decisions. We show that several possible representations allow a reinforcement learning agent to model animals' overall performance during the task. However, only a small subset of highly compressed representations simultaneously reproduced the co-variability in animals' choice behavior and dopaminergic activity. Strikingly, these representations predict an unusual distribution of response times that closely match animals' behavior. These results inform how constraints of representational efficiency may be expressed in encoding representations of dynamic cognitive variables used for reward-based computations.
Collapse
|
12
|
Dennison JB, Sazhin D, Smith DV. Decision neuroscience and neuroeconomics: Recent progress and ongoing challenges. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2022; 13:e1589. [PMID: 35137549 PMCID: PMC9124684 DOI: 10.1002/wcs.1589] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/28/2021] [Accepted: 12/21/2021] [Indexed: 01/10/2023]
Abstract
In the past decade, decision neuroscience and neuroeconomics have developed many new insights in the study of decision making. This review provides an overarching update on how the field has advanced in this time period. Although our initial review a decade ago outlined several theoretical, conceptual, methodological, empirical, and practical challenges, there has only been limited progress in resolving these challenges. We summarize significant trends in decision neuroscience through the lens of the challenges outlined for the field and review examples where the field has had significant, direct, and applicable impacts across economics and psychology. First, we review progress on topics including reward learning, explore-exploit decisions, risk and ambiguity, intertemporal choice, and valuation. Next, we assess the impacts of emotion, social rewards, and social context on decision making. Then, we follow up with how individual differences impact choices and new exciting developments in the prediction and neuroforecasting of future decisions. Finally, we consider how trends in decision-neuroscience research reflect progress toward resolving past challenges, discuss new and exciting applications of recent research, and identify new challenges for the field. This article is categorized under: Psychology > Reasoning and Decision Making Psychology > Emotion and Motivation.
Collapse
Affiliation(s)
- Jeffrey B Dennison
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| | - Daniel Sazhin
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| | - David V Smith
- Department of Psychology, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
13
|
Collins AGE, Shenhav A. Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Amitai Shenhav
- Department of Cognitive, Linguistic, & Psychological Sciences and Carney Institute for Brain Science, Brown University, Providence, RI, USA.
| |
Collapse
|
14
|
[Negative valence systems in the system of research domain criteria : Empirical results and new developments]. DER NERVENARZT 2021; 92:868-877. [PMID: 34351434 DOI: 10.1007/s00115-021-01166-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/25/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND The research domain criteria (RDoC) domain of negative valence systems can be used to subsume long established and recently developed research approaches, which build upon theoretical knowledge and clinical practice of various psychiatric disorders. OBJECTIVE This article outlines how the five constructs within the RDoC domain of negative valence systems can contribute to integrating empirical studies into a coherent and differentiated biopsychosocial model. MATERIAL AND METHODS This is a qualitative review article that summarizes empirical results and discusses new developments on the basis of exemplary studies and selected reviews. RESULTS AND DISCUSSION The RDoC domain of negative valence systems differentiates in three constructs the time horizon, in which persons need to adequately react to (1) acute, (2) potential, and (3) sustained threats elicited by negative stimuli or situations. These three constructs can be outlined relatively well with specific experimental paradigms and neuronal circuits. Two further constructs focus on the negative consequences of (4) losses and (5) frustrative non-rewards. The former seems to be currently relatively diffusely defined whereas the latter is clearly circumscribed by its relation to specific forms of aggression. Behavioral, physiological, and neuronal reactions to acute and potential threats can be well compared between humans and animals and can be specified with the help of mathematical models. These models can contribute to a better understanding of how healthy and diseased persons process negative stimuli or situations.
Collapse
|
15
|
Xia L, Collins AGE. Temporal and state abstractions for efficient learning, transfer, and composition in humans. Psychol Rev 2021; 128:643-666. [PMID: 34014709 PMCID: PMC8485577 DOI: 10.1037/rev0000295] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Humans use prior knowledge to efficiently solve novel tasks, but how they structure past knowledge during learning to enable such fast generalization is not well understood. We recently proposed that hierarchical state abstraction enabled generalization of simple one-step rules, by inferring context clusters for each rule. However, humans' daily tasks are often temporally extended, and necessitate more complex multi-step, hierarchically structured strategies. The options framework in hierarchical reinforcement learning provides a theoretical framework for representing such transferable strategies. Options are abstract multi-step policies, assembled from simpler one-step actions or other options, that can represent meaningful reusable strategies as temporal abstractions. We developed a novel sequential decision-making protocol to test if humans learn and transfer multi-step options. In a series of four experiments, we found transfer effects at multiple hierarchical levels of abstraction that could not be explained by flat reinforcement learning models or hierarchical models lacking temporal abstractions. We extended the options framework to develop a quantitative model that blends temporal and state abstractions. Our model captures the transfer effects observed in human participants. Our results provide evidence that humans create and compose hierarchical options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- Liyu Xia
- Department of Mathematics, University of California, Berkeley
| | - Anne G E Collins
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley
| |
Collapse
|
16
|
Xu HA, Modirshanechi A, Lehmann MP, Gerstner W, Herzog MH. Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making. PLoS Comput Biol 2021; 17:e1009070. [PMID: 34081705 PMCID: PMC8205159 DOI: 10.1371/journal.pcbi.1009070] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/15/2021] [Accepted: 05/12/2021] [Indexed: 11/19/2022] Open
Abstract
Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.
Collapse
Affiliation(s)
- He A. Xu
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Alireza Modirshanechi
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marco P. Lehmann
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Wulfram Gerstner
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Michael H. Herzog
- Laboratory of Psychophysics, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Brain-Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
17
|
Zhang Z, Wang S, Good M, Hristova S, Kayser AS, Hsu M. Retrieval-constrained valuation: Toward prediction of open-ended decisions. Proc Natl Acad Sci U S A 2021; 118:e2022685118. [PMID: 33990466 PMCID: PMC8157967 DOI: 10.1073/pnas.2022685118] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Real-world decisions are often open ended, with goals, choice options, or evaluation criteria conceived by decision-makers themselves. Critically, the quality of decisions may heavily rely on the generation of options, as failure to generate promising options limits, or even eliminates, the opportunity for choosing them. This core aspect of problem structuring, however, is largely absent from classical models of decision-making, thereby restricting their predictive scope. Here, we take a step toward addressing this issue by developing a neurally inspired cognitive model of a class of ill-structured decisions in which choice options must be self-generated. Specifically, using a model in which semantic memory retrieval is assumed to constrain the set of options available during valuation, we generate highly accurate out-of-sample predictions of choices across multiple categories of goods. Our model significantly and substantially outperforms models that only account for valuation or retrieval in isolation or those that make alternative mechanistic assumptions regarding their interaction. Furthermore, using neuroimaging, we confirm our core assumption regarding the engagement of, and interaction between, semantic memory retrieval and valuation processes. Together, these results provide a neurally grounded and mechanistic account of decisions with self-generated options, representing a step toward unraveling cognitive mechanisms underlying adaptive decision-making in the real world.
Collapse
Affiliation(s)
- Zhihao Zhang
- Haas School of Business, University of California, Berkeley, CA 94720;
- Social Science Matrix, University of California, Berkeley, CA 94720
| | - Shichun Wang
- Haas School of Business, University of California, Berkeley, CA 94720
| | - Maxwell Good
- Haas School of Business, University of California, Berkeley, CA 94720
- Department of Neurology, University of California, San Francisco, CA 94158
- Department of Veterans Affairs Northern California Health Care System, Martinez, CA 94553
| | - Siyana Hristova
- Haas School of Business, University of California, Berkeley, CA 94720
| | - Andrew S Kayser
- Department of Neurology, University of California, San Francisco, CA 94158;
- Department of Veterans Affairs Northern California Health Care System, Martinez, CA 94553
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720
| | - Ming Hsu
- Haas School of Business, University of California, Berkeley, CA 94720;
- Social Science Matrix, University of California, Berkeley, CA 94720
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94720
| |
Collapse
|
18
|
Vélez N, Gweon H. Learning from other minds: an optimistic critique of reinforcement learning models of social learning. Curr Opin Behav Sci 2021; 38:110-115. [DOI: 10.1016/j.cobeha.2021.01.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
19
|
Cross L, Cockburn J, Yue Y, O'Doherty JP. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 2021; 109:724-738.e7. [PMID: 33326755 PMCID: PMC7897245 DOI: 10.1016/j.neuron.2020.11.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 10/15/2020] [Accepted: 11/17/2020] [Indexed: 11/21/2022]
Abstract
Humans possess an exceptional aptitude to efficiently make decisions from high-dimensional sensory observations. However, it is unknown how the brain compactly represents the current state of the environment to guide this process. The deep Q-network (DQN) achieves this by capturing highly nonlinear mappings from multivariate inputs to the values of potential actions. We deployed DQN as a model of brain activity and behavior in participants playing three Atari video games during fMRI. Hidden layers of DQN exhibited a striking resemblance to voxel activity in a distributed sensorimotor network, extending throughout the dorsal visual pathway into posterior parietal cortex. Neural state-space representations emerged from nonlinear transformations of the pixel space bridging perception to action and reward. These transformations reshape axes to reflect relevant high-level features and strip away information about task-irrelevant sensory features. Our findings shed light on the neural encoding of task representations for decision-making in real-world situations.
Collapse
Affiliation(s)
- Logan Cross
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91125, USA.
| | - Jeff Cockburn
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - Yisong Yue
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
20
|
Alexander WH, Womelsdorf T. Interactions of Medial and Lateral Prefrontal Cortex in Hierarchical Predictive Coding. Front Comput Neurosci 2021; 15:605271. [PMID: 33613221 PMCID: PMC7888340 DOI: 10.3389/fncom.2021.605271] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 01/08/2021] [Indexed: 11/13/2022] Open
Abstract
Cognitive control and decision-making rely on the interplay of medial and lateral prefrontal cortex (mPFC/lPFC), particularly for circumstances in which correct behavior requires integrating and selecting among multiple sources of interrelated information. While the interaction between mPFC and lPFC is generally acknowledged as a crucial circuit in adaptive behavior, the nature of this interaction remains open to debate, with various proposals suggesting complementary roles in (i) signaling the need for and implementing control, (ii) identifying and selecting appropriate behavioral policies from a candidate set, and (iii) constructing behavioral schemata for performance of structured tasks. Although these proposed roles capture salient aspects of conjoint mPFC/lPFC function, none are sufficiently well-specified to provide a detailed account of the continuous interaction of the two regions during ongoing behavior. A recent computational model of mPFC and lPFC, the Hierarchical Error Representation (HER) model, places the regions within the framework of hierarchical predictive coding, and suggests how they interact during behavioral periods preceding and following salient events. In this manuscript, we extend the HER model to incorporate real-time temporal dynamics and demonstrate how the extended model is able to capture single-unit neurophysiological, behavioral, and network effects previously reported in the literature. Our results add to the wide range of results that can be accounted for by the HER model, and provide further evidence for predictive coding as a unifying framework for understanding PFC function and organization.
Collapse
Affiliation(s)
- William H. Alexander
- Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, United States
| | - Thilo Womelsdorf
- Department of Psychology, Vanderbilt University, Nashville, TN, United States
| |
Collapse
|
21
|
Abstract
Imagine that you meet someone new. You may wonder what they like, for example how much do they like baseball? You then get their feedback, which helps you to predict how much they like something similar, like basketball. We tested how teens and adults decide what others like and dislike and how they learn about others through feedback. This learning process can be described with mathematical models that calculate prediction errors—the difference between how much you think someone likes baseball and their actual preference for it. Teens and adults differed in how quickly they learned about others using this measure. Teens also tended to use a different brain region than adults when learning about the preferences of other people. This study helps us to understand how social learning develops over teenage years.
Collapse
|
22
|
Lockwood PL, Apps MAJ, Chang SWC. Is There a 'Social' Brain? Implementations and Algorithms. Trends Cogn Sci 2020; 24:802-813. [PMID: 32736965 DOI: 10.1016/j.tics.2020.06.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 06/29/2020] [Accepted: 06/30/2020] [Indexed: 12/21/2022]
Abstract
A fundamental question in psychology and neuroscience is the extent to which cognitive and neural processes are specialised for social behaviour, or are shared with other 'non-social' cognitive, perceptual, and motor faculties. Here we apply the influential framework of Marr (1982) across research in humans, monkeys, and rodents to propose that information processing can be understood as 'social' or 'non-social' at different levels. We argue that processes can be socially specialised at the implementational and/or the algorithmic level, and that changing the goal of social behaviour can also change social specificity. This framework could provide important new insights into the nature of social behaviour across species, facilitate greater integration, and inspire novel theoretical and empirical approaches.
Collapse
Affiliation(s)
- Patricia L Lockwood
- Department of Experimental Psychology, University of Oxford, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK; Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK.
| | - Matthew A J Apps
- Department of Experimental Psychology, University of Oxford, Oxford, UK; Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK; Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK
| | - Steve W C Chang
- Department of Psychology, Yale University, New Haven, CT, USA; Department of Neuroscience, Yale University School of Medicine, New Haven, CT, USA; Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
23
|
A reinforcement-learning approach to efficient communication. PLoS One 2020; 15:e0234894. [PMID: 32667959 PMCID: PMC7363069 DOI: 10.1371/journal.pone.0234894] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 06/04/2020] [Indexed: 11/19/2022] Open
Abstract
We present a multi-agent computational approach to partitioning semantic spaces using reinforcement-learning (RL). Two agents communicate using a finite linguistic vocabulary in order to convey a concept. This is tested in the color domain, and a natural reinforcement learning mechanism is shown to converge to a scheme that achieves a near-optimal trade-off of simplicity versus communication efficiency. Results are presented both on the communication efficiency as well as on analyses of the resulting partitions of the color space. The effect of varying environmental factors such as noise is also studied. These results suggest that RL offers a powerful and flexible computational framework that can contribute to the development of communication schemes for color names that are near-optimal in an information-theoretic sense and may shape color-naming systems across languages. Our approach is not specific to color and can be used to explore cross-language variation in other semantic domains.
Collapse
|
24
|
Abstract
Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation 'the car approaching the crosswalk was red' or with 'the approaching car was slowing down'? In this Perspective, we summarize our recent research into the computational and neural underpinnings of 'representation learning'-how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents 'task states', deploying them for decision-making and learning elsewhere in the brain.
Collapse
|
25
|
Petter EA, Gershman SJ, Meck WH. Integrating Models of Interval Timing and Reinforcement Learning. Trends Cogn Sci 2019; 22:911-922. [PMID: 30266150 DOI: 10.1016/j.tics.2018.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/23/2018] [Accepted: 08/13/2018] [Indexed: 10/28/2022]
Abstract
We present an integrated view of interval timing and reinforcement learning (RL) in the brain. The computational goal of RL is to maximize future rewards, and this depends crucially on a representation of time. Different RL systems in the brain process time in distinct ways. A model-based system learns 'what happens when', employing this internal model to generate action plans, while a model-free system learns to predict reward directly from a set of temporal basis functions. We describe how these systems are subserved by a computational division of labor between several brain regions, with a focus on the basal ganglia and the hippocampus, as well as how these regions are influenced by the neuromodulator dopamine.
Collapse
Affiliation(s)
- Elijah A Petter
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Warren H Meck
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA.
| |
Collapse
|
26
|
A Computational Model of Dual Competition between the Basal Ganglia and the Cortex. eNeuro 2019; 5:eN-TNC-0339-17. [PMID: 30627653 PMCID: PMC6325557 DOI: 10.1523/eneuro.0339-17.2018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 11/15/2018] [Accepted: 11/16/2018] [Indexed: 01/16/2023] Open
Abstract
We propose a model that includes interactions between the cortex, the basal ganglia (BG), and the thalamus based on a dual competition. We hypothesize that the striatum, the subthalamic nucleus (STN), the internal globus pallidus (GPi), the thalamus, and the cortex are involved in closed feedback loops through the hyperdirect and direct pathways. These loops support a competition process that results in the ability of BG to make a cognitive decision followed by a motor one. Considering lateral cortical interactions, another competition takes place inside the cortex allowing the latter to make a cognitive and a motor decision. We show how this dual competition endows the model with two regimes. One is driven by reinforcement learning and the other by Hebbian learning. The final decision is made according to a combination of these two mechanisms with a gradual transfer from the former to the latter. We confirmed these theoretical results on primates (Macaca mulatta) using a novel paradigm predicted by the model.
Collapse
|
27
|
Sun Q, Zhang M, Mujumdar AS. Recent developments of artificial intelligence in drying of fresh food: A review. Crit Rev Food Sci Nutr 2018; 59:2258-2275. [PMID: 29493285 DOI: 10.1080/10408398.2018.1446900] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Intellectualization is an important direction of drying development and artificial intelligence (AI) technologies have been widely used to solve problems of nonlinear function approximation, pattern detection, data interpretation, optimization, simulation, diagnosis, control, data sorting, clustering, and noise reduction in different food drying technologies due to the advantages of self-learning ability, adaptive ability, strong fault tolerance and high degree robustness to map the nonlinear structures of arbitrarily complex and dynamic phenomena. This article presents a comprehensive review on intelligent drying technologies and their applications. The paper starts with the introduction of basic theoretical knowledge of ANN, fuzzy logic and expert system. Then, we summarize the AI application of modeling, predicting, and optimization of heat and mass transfer, thermodynamic performance parameters, and quality indicators as well as physiochemical properties of dried products in artificial biomimetic technology (electronic nose, computer vision) and different conventional drying technologies. Furthermore, opportunities and limitations of AI technique in drying are also outlined to provide more ideas for researchers in this area.
Collapse
Affiliation(s)
- Qing Sun
- a State Key Laboratory of Food Science and Technology, Jiangnan University , Jiangsu , China.,c International Joint Laboratory on Food Safety, Jiangnan University , Jiangsu , China
| | - Min Zhang
- a State Key Laboratory of Food Science and Technology, Jiangnan University , Jiangsu , China.,b Jiangsu Province Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Jiangnan University , Wuxi , China
| | - Arun S Mujumdar
- d Department of Bioresource Engineering, Macdonald Campus, McGill University, Ste. Anne de Bellevue , Quebec , Canada
| |
Collapse
|
28
|
Song HF, Yang GR, Wang XJ. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 2017; 6:e21492. [PMID: 28084991 PMCID: PMC5293493 DOI: 10.7554/elife.21492] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 01/12/2017] [Indexed: 01/27/2023] Open
Abstract
Trained neural network models, which exhibit features of neural activity recorded from behaving animals, may provide insights into the circuit mechanisms of cognitive functions through systematic analysis of network activity and connectivity. However, in contrast to the graded error signals commonly used to train networks through supervised learning, animals learn from reward feedback on definite actions through reinforcement learning. Reward maximization is particularly relevant when optimal behavior depends on an animal's internal judgment of confidence or subjective preferences. Here, we implement reward-based training of recurrent neural networks in which a value network guides learning by using the activity of the decision network to predict future reward. We show that such models capture behavioral and electrophysiological findings from well-known experimental paradigms. Our work provides a unified framework for investigating diverse cognitive and value-based computations, and predicts a role for value representation that is essential for learning, but not executing, a task.
Collapse
Affiliation(s)
- H Francis Song
- Center for Neural Science, New York University, New York, United States
| | - Guangyu R Yang
- Center for Neural Science, New York University, New York, United States
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, United States
- NYU-ECNU Institute of Brain and Cognitive Science, NYU Shanghai, Shanghai, China
| |
Collapse
|
29
|
Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016; 12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open
Abstract
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.
Collapse
Affiliation(s)
- Ayaka Kato
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|