1
|
Barnett WH, Kuznetsov A, Lapish CC. Distinct cortico-striatal compartments drive competition between adaptive and automatized behavior. PLoS One 2023; 18:e0279841. [PMID: 36943842 PMCID: PMC10030038 DOI: 10.1371/journal.pone.0279841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 12/15/2022] [Indexed: 03/23/2023] Open
Abstract
Cortical and basal ganglia circuits play a crucial role in the formation of goal-directed and habitual behaviors. In this study, we investigate the cortico-striatal circuitry involved in learning and the role of this circuitry in the emergence of inflexible behaviors such as those observed in addiction. Specifically, we develop a computational model of cortico-striatal interactions that performs concurrent goal-directed and habit learning. The model accomplishes this by distinguishing learning processes in the dorsomedial striatum (DMS) that rely on reward prediction error signals as distinct from the dorsolateral striatum (DLS) where learning is supported by salience signals. These striatal subregions each operate on unique cortical input: the DMS receives input from the prefrontal cortex (PFC) which represents outcomes, and the DLS receives input from the premotor cortex which determines action selection. Following an initial learning of a two-alternative forced choice task, we subjected the model to reversal learning, reward devaluation, and learning a punished outcome. Behavior driven by stimulus-response associations in the DLS resisted goal-directed learning of new reward feedback rules despite devaluation or punishment, indicating the expression of habit. We repeated these simulations after the impairment of executive control, which was implemented as poor outcome representation in the PFC. The degraded executive control reduced the efficacy of goal-directed learning, and stimulus-response associations in the DLS were even more resistant to the learning of new reward feedback rules. In summary, this model describes how circuits of the dorsal striatum are dynamically engaged to control behavior and how the impairment of executive control by the PFC enhances inflexible behavior.
Collapse
Affiliation(s)
- William H. Barnett
- Department of Psychology, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Alexey Kuznetsov
- Department of Mathematics, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Christopher C. Lapish
- Department of Psychology, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
- Stark Neurosciences Research Institute, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| |
Collapse
|
2
|
Oguchi M, Li Y, Matsumoto Y, Kiyonari T, Yamamoto K, Sugiura S, Sakagami M. Proselfs depend more on model-based than model-free learning in a non-social probabilistic state-transition task. Sci Rep 2023; 13:1419. [PMID: 36697448 PMCID: PMC9876908 DOI: 10.1038/s41598-023-27609-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/04/2023] [Indexed: 01/26/2023] Open
Abstract
Humans form complex societies in which we routinely engage in social decision-making regarding the allocation of resources among ourselves and others. One dimension that characterizes social decision-making in particular is whether to prioritize self-interest or respect for others-proself or prosocial. What causes this individual difference in social value orientation? Recent developments in the social dual-process theory argue that social decision-making is characterized by its underlying domain-general learning systems: the model-free and model-based systems. In line with this "learning" approach, we propose and experimentally test the hypothesis that differences in social preferences stem from which learning system is dominant in an individual. Here, we used a non-social state transition task that allowed us to assess the balance between model-free/model-based learning and investigate its relation to the social value orientations. The results showed that proselfs depended more on model-based learning, whereas prosocials depended more on model-free learning. Reward amount and reaction time analyses showed that proselfs learned the task structure earlier in the session than prosocials, reflecting their difference in model-based/model-free learning dependence. These findings support the learning hypothesis on what makes differences in social preferences and have implications for understanding the mechanisms of prosocial behavior.
Collapse
Affiliation(s)
- Mineki Oguchi
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan
| | - Yang Li
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan.,Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Yoshie Matsumoto
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan.,Department of Psychology, Faculty of Human Sciences, Seinan Gakuin University, Fukuoka, Japan
| | - Toko Kiyonari
- School of Social Informatics, Aoyama Gakuin University, Kanagawa, Japan
| | | | | | - Masamichi Sakagami
- Brain Science Institute, Tamagawa University, 6-1-1, Tamagawagakuen, Machida, Tokyo, Japan.
| |
Collapse
|
3
|
Sheynikhovich D, Otani S, Bai J, Arleo A. Long-term memory, synaptic plasticity and dopamine in rodent medial prefrontal cortex: Role in executive functions. Front Behav Neurosci 2023; 16:1068271. [PMID: 36710953 PMCID: PMC9875091 DOI: 10.3389/fnbeh.2022.1068271] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 12/26/2022] [Indexed: 01/12/2023] Open
Abstract
Mnemonic functions, supporting rodent behavior in complex tasks, include both long-term and (short-term) working memory components. While working memory is thought to rely on persistent activity states in an active neural network, long-term memory and synaptic plasticity contribute to the formation of the underlying synaptic structure, determining the range of possible states. Whereas, the implication of working memory in executive functions, mediated by the prefrontal cortex (PFC) in primates and rodents, has been extensively studied, the contribution of long-term memory component to these tasks received little attention. This review summarizes available experimental data and theoretical work concerning cellular mechanisms of synaptic plasticity in the medial region of rodent PFC and the link between plasticity, memory and behavior in PFC-dependent tasks. A special attention is devoted to unique properties of dopaminergic modulation of prefrontal synaptic plasticity and its contribution to executive functions.
Collapse
Affiliation(s)
- Denis Sheynikhovich
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France,*Correspondence: Denis Sheynikhovich ✉
| | - Satoru Otani
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| | - Jing Bai
- Institute of Psychiatry and Neuroscience of Paris, INSERM U1266, Paris, France
| | - Angelo Arleo
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
| |
Collapse
|
4
|
Reducing Computational Cost During Robot Navigation and Human–Robot Interaction with a Human-Inspired Reinforcement Learning Architecture. Int J Soc Robot 2022. [DOI: 10.1007/s12369-022-00942-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
5
|
Massi E, Barthélemy J, Mailly J, Dromnelle R, Canitrot J, Poniatowski E, Girard B, Khamassi M. Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics. Front Neurorobot 2022; 16:864380. [PMID: 35812782 PMCID: PMC9263850 DOI: 10.3389/fnbot.2022.864380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 05/05/2022] [Indexed: 11/22/2022] Open
Abstract
Experience replay is widely used in AI to bootstrap reinforcement learning (RL) by enabling an agent to remember and reuse past experiences. Classical techniques include shuffled-, reversed-ordered- and prioritized-memory buffers, which have different properties and advantages depending on the nature of the data and problem. Interestingly, recent computational neuroscience work has shown that these techniques are relevant to model hippocampal reactivations recorded during rodent navigation. Nevertheless, the brain mechanisms for orchestrating hippocampal replay are still unclear. In this paper, we present recent neurorobotics research aiming to endow a navigating robot with a neuro-inspired RL architecture (including different learning strategies, such as model-based (MB) and model-free (MF), and different replay techniques). We illustrate through a series of numerical simulations how the specificities of robotic experimentation (e.g., autonomous state decomposition by the robot, noisy perception, state transition uncertainty, non-stationarity) can shed new lights on which replay techniques turn out to be more efficient in different situations. Finally, we close the loop by raising new hypotheses for neuroscience from such robotic models of hippocampal replay.
Collapse
|
6
|
Abstract
Humans and other animals use multiple strategies for making decisions. Reinforcement-learning theory distinguishes between stimulus-response (model-free; MF) learning and deliberative (model-based; MB) planning. The spatial-navigation literature presents a parallel dichotomy between navigation strategies. In "response learning," associated with the dorsolateral striatum (DLS), decisions are anchored to an egocentric reference frame. In "place learning," associated with the hippocampus, decisions are anchored to an allocentric reference frame. Emerging evidence suggests that the contribution of hippocampus to place learning may also underlie its contribution to MB learning by representing relational structure in a cognitive map. Here, we introduce a computational model in which hippocampus subserves place and MB learning by learning a "successor representation" of relational structure between states; DLS implements model-free response learning by learning associations between actions and egocentric representations of landmarks; and action values from either system are weighted by the reliability of its predictions. We show that this model reproduces a range of seemingly disparate behavioral findings in spatial and nonspatial decision tasks and explains the effects of lesions to DLS and hippocampus on these tasks. Furthermore, modeling place cells as driven by boundaries explains the observation that, unlike navigation guided by landmarks, navigation guided by boundaries is robust to "blocking" by prior state-reward associations due to learned associations between place cells. Our model, originally shaped by detailed constraints in the spatial literature, successfully characterizes the hippocampal-striatal system as a general system for decision making via adaptive combination of stimulus-response learning and the use of a cognitive map.
Collapse
|
7
|
Mao J, Hu X, Zhang L, He X, Milford M. A Bio-Inspired Goal-Directed Visual Navigation Model for Aerial Mobile Robots. J INTELL ROBOT SYST 2020. [DOI: 10.1007/s10846-020-01190-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Hangl S, Dunjko V, Briegel HJ, Piater J. Skill Learning by Autonomous Robotic Playing Using Active Learning and Exploratory Behavior Composition. Front Robot AI 2020; 7:42. [PMID: 33501210 PMCID: PMC7806109 DOI: 10.3389/frobt.2020.00042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Accepted: 03/09/2020] [Indexed: 11/13/2022] Open
Abstract
We consider the problem of autonomous acquisition of manipulation skills where problem-solving strategies are initially available only for a narrow range of situations. We propose to extend the range of solvable situations by autonomous play with the object. By applying previously-trained skills and behaviors, the robot learns how to prepare situations for which a successful strategy is already known. The information gathered during autonomous play is additionally used to train an environment model. This model is exploited for active learning and the generation of novel preparatory behaviors compositions. We apply our approach to a wide range of different manipulation tasks, e.g., book grasping, grasping of objects of different sizes by selecting different grasping strategies, placement on shelves, and tower disassembly. We show that the composite behavior generation mechanism enables the robot to solve previously-unsolvable tasks, e.g., tower disassembly. We use success statistics gained during real-world experiments to simulate the convergence behavior of our system. Simulation experiments show that the learning speed can be improved by around 30% by using active learning.
Collapse
Affiliation(s)
- Simon Hangl
- Intelligent and Interactive Systems, Department of Informatics, University of Innsbruck, Innsbruck, Austria
| | | | - Hans J. Briegel
- Institute for Theoretical Physics, University of Innsbruck, Innsbruck, Austria
| | - Justus Piater
- Intelligent and Interactive Systems, Department of Informatics, University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
9
|
Khamassi M, Girard B. Modeling awake hippocampal reactivations with model-based bidirectional search. BIOLOGICAL CYBERNETICS 2020; 114:231-248. [PMID: 32065253 DOI: 10.1007/s00422-020-00817-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 01/21/2020] [Indexed: 06/10/2023]
Abstract
Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal's performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo-prefronto-striatal network in learning.
Collapse
Affiliation(s)
- Mehdi Khamassi
- Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université and CNRS (Centre National de la Recherche Scientifique), 75005, Paris, France.
| | - Benoît Girard
- Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université and CNRS (Centre National de la Recherche Scientifique), 75005, Paris, France
| |
Collapse
|
10
|
Cazin N, Scleidorovich P, Weitzenfeld A, Dominey PF. Real-time sensory-motor integration of hippocampal place cell replay and prefrontal sequence learning in simulated and physical rat robots for novel path optimization. BIOLOGICAL CYBERNETICS 2020; 114:249-268. [PMID: 32095878 DOI: 10.1007/s00422-020-00820-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 02/04/2020] [Indexed: 06/10/2023]
Abstract
An open problem in the cognitive dimensions of navigation concerns how previous exploratory experience is reorganized in order to allow the creation of novel efficient navigation trajectories. This behavior is revealed in the "traveling salesrat problem" (TSP) when rats discover the shortest path linking baited food wells after a few exploratory traversals. We have recently published a model of navigation sequence learning, where sharp wave ripple replay of hippocampal place cells transmit "snippets" of the recent trajectories that the animal has explored to the prefrontal cortex (PFC) (Cazin et al. in PLoS Comput Biol 15:e1006624, 2019). PFC is modeled as a recurrent reservoir network that is able to assemble these snippets into the efficient sequence (trajectory of spatial locations coded by place cell activation). The model of hippocampal replay generates a distribution of snippets as a function of their proximity to a reward, thus implementing a form of spatial credit assignment that solves the TSP task. The integrative PFC reservoir reconstructs the efficient TSP sequence based on exposure to this distribution of snippets that favors paths that are most proximal to rewards. While this demonstrates the theoretical feasibility of the PFC-HIPP interaction, the integration of such a dynamic system into a real-time sensory-motor system remains a challenge. In the current research, we test the hypothesis that the PFC reservoir model can operate in a real-time sensory-motor loop. Thus, the main goal of the paper is to validate the model in simulated and real robot scenarios. Place cell activation encoding the current position of the simulated and physical rat robot feeds the PFC reservoir which generates the successor place cell activation that represents the next step in the reproduced sequence in the readout. This is input to the robot, which advances to the coded location and then generates de novo the current place cell activation. This allows demonstration of the crucial role of embodiment. If the spatial code readout from PFC is played back directly into PFC, error can accumulate, and the system can diverge from desired trajectories. This required a spatial filter to decode the PFC code to a location and then recode a new place cell code for that location. In the robot, the place cell vector output of PFC is used to physically displace the robot and then generate a new place cell coded input to the PFC, replacing part of the software recoding procedure that was required otherwise. We demonstrate how this integrated sensory-motor system can learn simple navigation sequences and then, importantly, how it can synthesize novel efficient sequences based on prior experience, as previously demonstrated (Cazin et al. 2019). This contributes to the understanding of hippocampal replay in novel navigation sequence formation and the important role of embodiment.
Collapse
Affiliation(s)
- Nicolas Cazin
- INSERM UMR1093-CAPS, Université Bourgogne Franche-Comté, UFR des Sciences du Sport, 21000, Dijon, France
- Robot Cognition Laboratory, Institut Marey, INSERM U1093 CAPS, UBFC, Dijon, France
| | | | | | - Peter Ford Dominey
- INSERM UMR1093-CAPS, Université Bourgogne Franche-Comté, UFR des Sciences du Sport, 21000, Dijon, France.
- Robot Cognition Laboratory, Institut Marey, INSERM U1093 CAPS, UBFC, Dijon, France.
| |
Collapse
|
11
|
Edvardsen V, Bicanski A, Burgess N. Navigating with grid and place cells in cluttered environments. Hippocampus 2019; 30:220-232. [PMID: 31408264 PMCID: PMC8641373 DOI: 10.1002/hipo.23147] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 06/26/2019] [Accepted: 07/19/2019] [Indexed: 11/20/2022]
Abstract
Hippocampal formation contains several classes of neurons thought to be involved in navigational processes, in particular place cells and grid cells. Place cells have been associated with a topological strategy for navigation, while grid cells have been suggested to support metric vector navigation. Grid cell‐based vector navigation can support novel shortcuts across unexplored territory by providing the direction toward the goal. However, this strategy is insufficient in natural environments cluttered with obstacles. Here, we show how navigation in complex environments can be supported by integrating a grid cell‐based vector navigation mechanism with local obstacle avoidance mediated by border cells and place cells whose interconnections form an experience‐dependent topological graph of the environment. When vector navigation and object avoidance fail (i.e., the agent gets stuck), place cell replay events set closer subgoals for vector navigation. We demonstrate that this combined navigation model can successfully traverse environments cluttered by obstacles and is particularly useful where the environment is underexplored. Finally, we show that the model enables the simulated agent to successfully navigate experimental maze environments from the animal literature on cognitive mapping. The proposed model is sufficiently flexible to support navigation in different environments, and may inform the design of experiments to relate different navigational abilities to place, grid, and border cell firing.
Collapse
Affiliation(s)
- Vegard Edvardsen
- Department of Computer Science, NTNU-Norwegian University of Science and Technology, Trondheim, Norway
| | - Andrej Bicanski
- Institute of Cognitive Neuroscience, University College London, Alexandra House, 17 Queen Square, WC1N 3AZ London, UK
| | - Neil Burgess
- Institute of Cognitive Neuroscience, University College London, Alexandra House, 17 Queen Square, WC1N 3AZ London, UK
| |
Collapse
|
12
|
Cazé R, Khamassi M, Aubin L, Girard B. Hippocampal replays under the scrutiny of reinforcement learning models. J Neurophysiol 2018; 120:2877-2896. [DOI: 10.1152/jn.00145.2018] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Multiple in vivo studies have shown that place cells from the hippocampus replay previously experienced trajectories. These replays are commonly considered to mainly reflect memory consolidation processes. Some data, however, have highlighted a functional link between replays and reinforcement learning (RL). This theory, extensively used in machine learning, has introduced efficient algorithms and can explain various behavioral and physiological measures from different brain regions. RL algorithms could constitute a mechanistic description of replays and explain how replays can reduce the number of iterations required to explore the environment during learning. We review the main findings concerning the different hippocampal replay types and the possible associated RL models (either model-based, model-free, or hybrid model types). We conclude by tying these frameworks together. We illustrate the link between data and RL through a series of model simulations. This review, at the frontier between informatics and biology, paves the way for future work on replays.
Collapse
Affiliation(s)
- Romain Cazé
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Mehdi Khamassi
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Lise Aubin
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Benoît Girard
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| |
Collapse
|
13
|
Chatila R, Renaudo E, Andries M, Chavez-Garcia RO, Luce-Vayrac P, Gottstein R, Alami R, Clodic A, Devin S, Girard B, Khamassi M. Toward Self-Aware Robots. Front Robot AI 2018; 5:88. [PMID: 33500967 PMCID: PMC7805649 DOI: 10.3389/frobt.2018.00088] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 07/03/2018] [Indexed: 11/13/2022] Open
Abstract
Despite major progress in Robotics and AI, robots are still basically "zombies" repeatedly achieving actions and tasks without understanding what they are doing. Deep-Learning AI programs classify tremendous amounts of data without grasping the meaning of their inputs or outputs. We still lack a genuine theory of the underlying principles and methods that would enable robots to understand their environment, to be cognizant of what they do, to take appropriate and timely initiatives, to learn from their own experience and to show that they know that they have learned and how. The rationale of this paper is that the understanding of its environment by an agent (the agent itself and its effects on the environment included) requires its self-awareness, which actually is itself emerging as a result of this understanding and the distinction that the agent is capable to make between its own mind-body and its environment. The paper develops along five issues: agent perception and interaction with the environment; learning actions; agent interaction with other agents-specifically humans; decision-making; and the cognitive architecture integrating these capacities.
Collapse
Affiliation(s)
- Raja Chatila
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Erwan Renaudo
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Mihai Andries
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
- Institute for Systems and Robotics, Instituto Superior Técnico, Lisbon, Portugal
| | - Ricardo-Omar Chavez-Garcia
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
- Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA), Università della Svizzera Italiana - Scuola universitaria professionale della Svizzera italiana (USI-SUPSI), Lugano, Switzerland
| | - Pierre Luce-Vayrac
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Raphael Gottstein
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Rachid Alami
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Aurélie Clodic
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Sandra Devin
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Benoît Girard
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| | - Mehdi Khamassi
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, Paris, France
| |
Collapse
|
14
|
Bio-Inspired Robotics: A Spatial Cognition Model integrating Place Cells, Grid Cells and Head Direction Cells. J INTELL ROBOT SYST 2018. [DOI: 10.1007/s10846-018-0852-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
15
|
Dollé L, Chavarriaga R, Guillot A, Khamassi M. Interactions of spatial strategies producing generalization gradient and blocking: A computational approach. PLoS Comput Biol 2018; 14:e1006092. [PMID: 29630600 PMCID: PMC5908205 DOI: 10.1371/journal.pcbi.1006092] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 04/19/2018] [Accepted: 03/15/2018] [Indexed: 12/16/2022] Open
Abstract
We present a computational model of spatial navigation comprising different learning mechanisms in mammals, i.e., associative, cognitive mapping and parallel systems. This model is able to reproduce a large number of experimental results in different variants of the Morris water maze task, including standard associative phenomena (spatial generalization gradient and blocking), as well as navigation based on cognitive mapping. Furthermore, we show that competitive and cooperative patterns between different navigation strategies in the model allow to explain previous apparently contradictory results supporting either associative or cognitive mechanisms for spatial learning. The key computational mechanism to reconcile experimental results showing different influences of distal and proximal cues on the behavior, different learning times, and different abilities of individuals to alternatively perform spatial and response strategies, relies in the dynamic coordination of navigation strategies, whose performance is evaluated online with a common currency through a modular approach. We provide a set of concrete experimental predictions to further test the computational model. Overall, this computational work sheds new light on inter-individual differences in navigation learning, and provides a formal and mechanistic approach to test various theories of spatial cognition in mammals.
Collapse
Affiliation(s)
- Laurent Dollé
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, F-75005 Paris, France
| | - Ricardo Chavarriaga
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Institute of Bioengineering and School of Engineering, EPFL, Geneva, Switzerland
| | - Agnès Guillot
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, F-75005 Paris, France
| | - Mehdi Khamassi
- Institute of Intelligent Systems and Robotics, Sorbonne Université, CNRS, F-75005 Paris, France
| |
Collapse
|
16
|
Chersi F, Burgess N. The Cognitive Architecture of Spatial Navigation: Hippocampal and Striatal Contributions. Neuron 2016; 88:64-77. [PMID: 26447573 DOI: 10.1016/j.neuron.2015.09.021] [Citation(s) in RCA: 134] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Spatial navigation can serve as a model system in cognitive neuroscience, in which specific neural representations, learning rules, and control strategies can be inferred from the vast experimental literature that exists across many species, including humans. Here, we review this literature, focusing on the contributions of hippocampal and striatal systems, and attempt to outline a minimal cognitive architecture that is consistent with the experimental literature and that synthesizes previous related computational modeling. The resulting architecture includes striatal reinforcement learning based on egocentric representations of sensory states and actions, incidental Hebbian association of sensory information with allocentric state representations in the hippocampus, and arbitration of the outputs of both systems based on confidence/uncertainty in medial prefrontal cortex. We discuss the relationship between this architecture and learning in model-free and model-based systems, episodic memory, imagery, and planning, including some open questions and directions for further experiments.
Collapse
Affiliation(s)
- Fabian Chersi
- Institute of Cognitive Neuroscience & Institute of Neurology, University College London, 17 Queen Square, London, WC1N 3AZ, UK.
| | - Neil Burgess
- Institute of Cognitive Neuroscience & Institute of Neurology, University College London, 17 Queen Square, London, WC1N 3AZ, UK.
| |
Collapse
|
17
|
Llofriu M, Tejera G, Contreras M, Pelc T, Fellous J, Weitzenfeld A. Goal-oriented robot navigation learning using a multi-scale space representation. Neural Netw 2015; 72:62-74. [DOI: 10.1016/j.neunet.2015.09.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2015] [Revised: 09/21/2015] [Accepted: 09/21/2015] [Indexed: 10/22/2022]
|
18
|
Viejo G, Khamassi M, Brovelli A, Girard B. Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front Behav Neurosci 2015; 9:225. [PMID: 26379518 PMCID: PMC4549628 DOI: 10.3389/fnbeh.2015.00225] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 08/10/2015] [Indexed: 11/18/2022] Open
Abstract
Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies. To better elucidate the computations underlying flexible behavior, we develop a dual-system computational model that can predict both performance (i.e., participants' choices) and modulations in reaction times during learning of a stimulus–response association task. The habitual system is modeled with a simple Q-Learning algorithm (QL). For the goal-directed system, we propose a new Bayesian Working Memory (BWM) model that searches for information in the history of previous trials in order to minimize Shannon entropy. We propose a model for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. We test the ability of QL or BWM alone to explain human behavior, and compare them with the performance of model combinations, to highlight the need for such combinations to explain behavior. Two of the tested combination models are derived from the literature, and the latter being our new proposal. In conclusion, all subjects were better explained by model combinations, and the majority of them are explained by our new coordination proposal.
Collapse
Affiliation(s)
- Guillaume Viejo
- Sorbonne Université, Université Pierre et Marie Curie, Univ Paris 06, UMR 7222, Institut des Systèmes Intelligents et de Robotique Paris, France ; Centre National de la Recherche Scientifique, UMR 7222, ISIR Paris, France
| | - Mehdi Khamassi
- Sorbonne Université, Université Pierre et Marie Curie, Univ Paris 06, UMR 7222, Institut des Systèmes Intelligents et de Robotique Paris, France ; Centre National de la Recherche Scientifique, UMR 7222, ISIR Paris, France
| | - Andrea Brovelli
- Institut de Neurosciences de la Timone, UMR 7289, Centre National de la Recherche Scientifique - Aix Marseille Université Marseille, France
| | - Benoît Girard
- Sorbonne Université, Université Pierre et Marie Curie, Univ Paris 06, UMR 7222, Institut des Systèmes Intelligents et de Robotique Paris, France ; Centre National de la Recherche Scientifique, UMR 7222, ISIR Paris, France
| |
Collapse
|
19
|
Barrera A, Tejera G, Llofriu M, Weitzenfeld A. Learning Spatial Localization: From Rat Studies to Computational Models of the Hippocampus. SPATIAL COGNITION AND COMPUTATION 2014. [DOI: 10.1080/13875868.2014.961602] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
20
|
Renaudo E, Girard B, Chatila R, Khamassi M. Design of a Control Architecture for Habit Learning in Robots. BIOMIMETIC AND BIOHYBRID SYSTEMS 2014. [DOI: 10.1007/978-3-319-09435-9_22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
21
|
Khamassi M, Humphries MD. Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 2012. [PMID: 23205006 PMCID: PMC3506961 DOI: 10.3389/fnbeh.2012.00079] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Behavior in spatial navigation is often organized into map-based (place-driven) vs. map-free (cue-driven) strategies; behavior in operant conditioning research is often organized into goal-directed vs. habitual strategies. Here we attempt to unify the two. We review one powerful theory for distinct forms of learning during instrumental conditioning, namely model-based (maintaining a representation of the world) and model-free (reacting to immediate stimuli) learning algorithms. We extend these lines of argument to propose an alternative taxonomy for spatial navigation, showing how various previously identified strategies can be distinguished as “model-based” or “model-free” depending on the usage of information and not on the type of information (e.g., cue vs. place). We argue that identifying “model-free” learning with dorsolateral striatum and “model-based” learning with dorsomedial striatum could reconcile numerous conflicting results in the spatial navigation literature. From this perspective, we further propose that the ventral striatum plays key roles in the model-building process. We propose that the core of the ventral striatum is positioned to learn the probability of action selection for every transition between states of the world. We further review suggestions that the ventral striatal core and shell are positioned to act as “critics” contributing to the computation of a reward prediction error for model-free and model-based systems, respectively.
Collapse
Affiliation(s)
- Mehdi Khamassi
- Institut des Systèmes Intelligents et de Robotique, Université Pierre et Marie Curie Paris, France ; Centre National de la Recherche Scientifique, UMR7222 Paris, France
| | | |
Collapse
|
22
|
Sukumar D, Rengaswamy M, Chakravarthy VS. Modeling the contributions of Basal ganglia and Hippocampus to spatial navigation using reinforcement learning. PLoS One 2012; 7:e47467. [PMID: 23110073 PMCID: PMC3482225 DOI: 10.1371/journal.pone.0047467] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 09/11/2012] [Indexed: 11/29/2022] Open
Abstract
A computational neural model that describes the competing roles of Basal Ganglia and Hippocampus in spatial navigation is presented. Model performance is evaluated on a simulated Morris water maze explored by a model rat. Cue-based and place-based navigational strategies, thought to be subserved by the Basal ganglia and Hippocampus respectively, are described. In cue-based navigation, the model rat learns to directly head towards a visible target, while in place-based navigation the target position is represented in terms of spatial context provided by an array of poles placed around the pool. Learning is formulated within the framework of Reinforcement Learning, with the nigrostriatal dopamine signal playing the role of Temporal Difference Error. Navigation inherently involves two apparently contradictory movements: goal oriented movements vs. random, wandering movements. The model hypothesizes that while the goal-directedness is determined by the gradient in Value function, randomness is driven by the complex activity of the SubThalamic Nucleus (STN)-Globus Pallidus externa (GPe) system. Each navigational system is associated with a Critic, prescribing actions that maximize value gradients for the corresponding system. In the integrated system, that incorporates both cue-based and place-based forms of navigation, navigation at a given position is determined by the system whose value function is greater at that position. The proposed model describes the experimental results of [1], a lesion-study that investigates the competition between cue-based and place-based navigational systems. The present study also examines impaired navigational performance under Parkinsonian-like conditions. The integrated navigational system, operated under dopamine-deficient conditions, exhibits increased escape latency as was observed in experimental literature describing MPTP model rats navigating a water maze.
Collapse
Affiliation(s)
| | - Maithreye Rengaswamy
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India
| | | |
Collapse
|
23
|
Martinet LE, Sheynikhovich D, Benchenane K, Arleo A. Spatial learning and action planning in a prefrontal cortical network model. PLoS Comput Biol 2011; 7:e1002045. [PMID: 21625569 PMCID: PMC3098199 DOI: 10.1371/journal.pcbi.1002045] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Accepted: 03/20/2011] [Indexed: 01/29/2023] Open
Abstract
The interplay between hippocampus and prefrontal cortex (PFC) is fundamental to
spatial cognition. Complementing hippocampal place coding, prefrontal
representations provide more abstract and hierarchically organized memories
suitable for decision making. We model a prefrontal network mediating
distributed information processing for spatial learning and action planning.
Specific connectivity and synaptic adaptation principles shape the recurrent
dynamics of the network arranged in cortical minicolumns. We show how the PFC
columnar organization is suitable for learning sparse topological-metrical
representations from redundant hippocampal inputs. The recurrent nature of the
network supports multilevel spatial processing, allowing structural features of
the environment to be encoded. An activation diffusion mechanism spreads the
neural activity through the column population leading to trajectory planning.
The model provides a functional framework for interpreting the activity of PFC
neurons recorded during navigation tasks. We illustrate the link from single
unit activity to behavioral responses. The results suggest plausible neural
mechanisms subserving the cognitive “insight” capability originally
attributed to rodents by Tolman & Honzik. Our time course analysis of neural
responses shows how the interaction between hippocampus and PFC can yield the
encoding of manifold information pertinent to spatial planning, including
prospective coding and distance-to-goal correlates. We study spatial cognition, a high-level brain function based upon the ability to
elaborate mental representations of the environment supporting goal-oriented
navigation. Spatial cognition involves parallel information processing across a
distributed network of interrelated brain regions. Depending on the complexity
of the spatial navigation task, different neural circuits may be primarily
involved, corresponding to different behavioral strategies. Navigation planning,
one of the most flexible strategies, is based on the ability to prospectively
evaluate alternative sequences of actions in order to infer optimal trajectories
to a goal. The hippocampal formation and the prefrontal cortex are two neural
substrates likely involved in navigation planning. We adopt a computational
modeling approach to show how the interactions between these two brain areas may
lead to learning of topological representations suitable to mediate action
planning. Our model suggests plausible neural mechanisms subserving the
cognitive spatial capabilities attributed to rodents. We provide a functional
framework for interpreting the activity of prefrontal and hippocampal neurons
recorded during navigation tasks. Akin to integrative neuroscience approaches,
we illustrate the link from single unit activity to behavioral responses while
solving spatial learning tasks.
Collapse
Affiliation(s)
- Louis-Emmanuel Martinet
- Laboratory of Neurobiology of Adaptive Processes, UMR 7102, CNRS - UPMC
Univ P6, Paris, France
| | - Denis Sheynikhovich
- Laboratory of Neurobiology of Adaptive Processes, UMR 7102, CNRS - UPMC
Univ P6, Paris, France
| | - Karim Benchenane
- Laboratory of Neurobiology of Adaptive Processes, UMR 7102, CNRS - UPMC
Univ P6, Paris, France
| | - Angelo Arleo
- Laboratory of Neurobiology of Adaptive Processes, UMR 7102, CNRS - UPMC
Univ P6, Paris, France
- * E-mail:
| |
Collapse
|
24
|
Sheynikhovich D, Arleo A. A reinforcement learning approach to model interactions between landmarks and geometric cues during spatial learning. Brain Res 2010; 1365:35-47. [DOI: 10.1016/j.brainres.2010.09.091] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Revised: 09/20/2010] [Accepted: 09/26/2010] [Indexed: 10/19/2022]
|