1
|
Antonov G, Dayan P. Exploring replay. Nat Commun 2025; 16:1657. [PMID: 39955280 PMCID: PMC11829958 DOI: 10.1038/s41467-025-56731-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 01/29/2025] [Indexed: 02/17/2025] Open
Abstract
Animals face uncertainty about their environments due to initial ignorance or subsequent changes. They therefore need to explore. However, the algorithmic structure of exploratory choices in the brain still remains largely elusive. Artificial agents face the same problem, and a venerable idea in reinforcement learning is that they can plan appropriate exploratory choices offline, during the equivalent of quiet wakefulness or sleep. Although offline processing in humans and other animals, in the form of hippocampal replay and preplay, has recently been the subject of highly informative modelling, existing methods only apply to known environments. Thus, they cannot predict exploratory replay choices during learning and/or behaviour in the face of uncertainty. Here, we extend an influential theory of hippocampal replay and examine its potential role in approximately optimal exploration, deriving testable predictions for the patterns of exploratory replay choices in a paradigmatic spatial navigation task. Our modelling provides a normative interpretation of the available experimental data suggestive of exploratory replay. Furthermore, we highlight the importance of sequence replay, and license a range of new experimental paradigms that should further our understanding of offline processing.
Collapse
Affiliation(s)
- Georgy Antonov
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
- Graduate Training Centre of Neuroscience, International Max Planck Research School, University of Tübingen, Tübingen, Germany.
| | - Peter Dayan
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
- University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Tashjian SM, Cussen J, Deng W, Zhang B, Mobbs D. Subregions in the ventromedial prefrontal cortex integrate threat and protective information to meta-represent safety. PLoS Biol 2025; 23:e3002986. [PMID: 39804855 PMCID: PMC11730396 DOI: 10.1371/journal.pbio.3002986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 12/16/2024] [Indexed: 01/16/2025] Open
Abstract
Pivotal to self-preservation is the ability to identify when we are safe and when we are in danger. Previous studies have focused on safety estimations based on the features of external threats and do not consider how the brain integrates other key factors, including estimates about our ability to protect ourselves. Here, we examine the neural systems underlying the online dynamic encoding of safety. The current preregistered study used 2 novel tasks to test 4 facets of safety estimation: Safety Prediction, Meta-representation, Recognition, and Value Updating. We experimentally manipulated safety estimation changing both levels of external threats and self-protection. Data were collected in 2 independent samples (behavioral N = 100; MRI N = 30). We found consistent evidence of subjective changes in the sensitivity to safety conferred through protection. Neural responses in the ventromedial prefrontal cortex (vmPFC) tracked increases in safety during all safety estimation facets, with specific tuning to protection. Further, informational connectivity analyses revealed distinct hubs of safety coding in the posterior and anterior vmPFC for external threats and protection, respectively. These findings reveal a central role of the vmPFC for coding safety.
Collapse
Affiliation(s)
- Sarah M. Tashjian
- School of Psychological Sciences, University of Melbourne, Parkville, Australia
- Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
| | - Joseph Cussen
- School of Psychological Sciences, University of Melbourne, Parkville, Australia
| | - Wenning Deng
- Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
| | - Bo Zhang
- Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
| | - Dean Mobbs
- Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
3
|
Göktepe-Kavis P, Aellen FM, Cortese A, Castegnetti G, de Martino B, Tzovara A. Context changes retrieval of prospective outcomes during decision deliberation. Cereb Cortex 2024; 34:bhae483. [PMID: 39710609 DOI: 10.1093/cercor/bhae483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 11/18/2024] [Accepted: 12/06/2024] [Indexed: 12/24/2024] Open
Abstract
Foreseeing the future outcomes is the art of decision-making. Substantial evidence shows that, during choice deliberation, the brain can retrieve prospective decision outcomes. However, decisions are seldom made in a vacuum. Context carries information that can radically affect the outcomes of a choice. Nevertheless, most investigations of retrieval processes examined decisions in isolation, disregarding the context in which they occur. Here, we studied how context shapes prospective outcome retrieval during deliberation. We designed a decision-making task where participants were presented with object-context pairs and made decisions which led to a certain outcome. We show during deliberation, likely outcomes were retrieved in transient patterns of neural activity, as early as 3 s before participants decided. The strength of prospective outcome retrieval explains participants' behavioral efficiency, but only when context affects the decision outcome. Our results suggest context imparts strong constraints on retrieval processes and how neural representations are shaped during decision-making.
Collapse
Affiliation(s)
- Pinar Göktepe-Kavis
- Institute of Computer Science, University of Bern, 3012 Bern, Switzerland
- Center for Experimental Neurology - Sleep Wake Epilepsy Center - NeuroTec, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, 3010 Bern, Switzerland
| | - Florence M Aellen
- Institute of Computer Science, University of Bern, 3012 Bern, Switzerland
- Center for Experimental Neurology - Sleep Wake Epilepsy Center - NeuroTec, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, 3010 Bern, Switzerland
| | - Aurelio Cortese
- Computational Neuroscience Laboratories, Advanced Telecommunications Research Institute International, 619-0288 Kyoto, Japan
| | - Giuseppe Castegnetti
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom
| | - Benedetto de Martino
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom
| | - Athina Tzovara
- Institute of Computer Science, University of Bern, 3012 Bern, Switzerland
- Center for Experimental Neurology - Sleep Wake Epilepsy Center - NeuroTec, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, 3010 Bern, Switzerland
| |
Collapse
|
4
|
Sharp PB, Eldar E. Humans adaptively deploy forward and backward prediction. Nat Hum Behav 2024; 8:1726-1737. [PMID: 39014069 PMCID: PMC11878374 DOI: 10.1038/s41562-024-01930-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/17/2024] [Indexed: 07/18/2024]
Abstract
The formation of predictions is essential to our ability to build models of the world and use them for intelligent decision-making. Here we challenge the dominant assumption that humans form only forward predictions, which specify what future events are likely to follow a given present event. We demonstrate that in some environments, it is more efficient to use backward prediction, which specifies what present events are likely to precede a given future event. This is particularly the case in diverging environments, where possible future events outnumber possible present events. Correspondingly, in six preregistered experiments (n = 1,299) involving both simple decision-making and more challenging planning tasks, we find that humans engage in backward prediction in divergent environments and use forward prediction in convergent environments. We thus establish that humans adaptively deploy forward and backward prediction in the service of efficient decision-making.
Collapse
Affiliation(s)
- Paul B Sharp
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem, Israel.
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem, Israel.
- Department of Psychology, Yale University, New Haven, CT, USA.
| | - Eran Eldar
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem, Israel.
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
5
|
Kern S, Nagel J, Gerchen MF, Gürsoy Ç, Meyer-Lindenberg A, Kirsch P, Dolan RJ, Gais S, Feld GB. Reactivation strength during cued recall is modulated by graph distance within cognitive maps. eLife 2024; 12:RP93357. [PMID: 38810249 PMCID: PMC11136493 DOI: 10.7554/elife.93357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024] Open
Abstract
Declarative memory retrieval is thought to involve reinstatement of neuronal activity patterns elicited and encoded during a prior learning episode. Furthermore, it is suggested that two mechanisms operate during reinstatement, dependent on task demands: individual memory items can be reactivated simultaneously as a clustered occurrence or, alternatively, replayed sequentially as temporally separate instances. In the current study, participants learned associations between images that were embedded in a directed graph network and retained this information over a brief 8 min consolidation period. During a subsequent cued recall session, participants retrieved the learned information while undergoing magnetoencephalographic recording. Using a trained stimulus decoder, we found evidence for clustered reactivation of learned material. Reactivation strength of individual items during clustered reactivation decreased as a function of increasing graph distance, an ordering present solely for successful retrieval but not for retrieval failure. In line with previous research, we found evidence that sequential replay was dependent on retrieval performance and was most evident in low performers. The results provide evidence for distinct performance-dependent retrieval mechanisms, with graded clustered reactivation emerging as a plausible mechanism to search within abstract cognitive maps.
Collapse
Affiliation(s)
- Simon Kern
- Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Addiction Behavior and Addiction Medicine, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
| | - Juliane Nagel
- Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Addiction Behavior and Addiction Medicine, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
| | - Martin F Gerchen
- Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Department of Psychology, Ruprecht Karl University of HeidelbergHeidelbergGermany
- Bernstein Center for Computational Neuroscience Heidelberg/MannheimMannheimGermany
| | - Çağatay Gürsoy
- Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Addiction Behavior and Addiction Medicine, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
| | - Andreas Meyer-Lindenberg
- Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Bernstein Center for Computational Neuroscience Heidelberg/MannheimMannheimGermany
| | - Peter Kirsch
- Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Department of Psychology, Ruprecht Karl University of HeidelbergHeidelbergGermany
- Bernstein Center for Computational Neuroscience Heidelberg/MannheimMannheimGermany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Centre for Human Neuroimaging, University College LondonLondonUnited Kingdom
| | - Steffen Gais
- Institute of Medical Psychology and Behavioral Neurobiology, Eberhard-Karls-University TübingenTübingenGermany
| | - Gordon B Feld
- Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Addiction Behavior and Addiction Medicine, Central Institute of Mental Health, Medical Faculty Mannheim, University of HeidelbergMannheimGermany
- Department of Psychology, Ruprecht Karl University of HeidelbergHeidelbergGermany
| |
Collapse
|
6
|
Chen HT, van der Meer MAA. Paradoxical replay can protect contextual task representations from destructive interference when experience is unbalanced. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.09.593332. [PMID: 38766204 PMCID: PMC11100794 DOI: 10.1101/2024.05.09.593332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Experience replay is a powerful mechanism to learn efficiently from limited experience. Despite several decades of compelling experimental results, the factors that determine which experiences are selected for replay remain unclear. A particular challenge for current theories is that on tasks that feature unbalanced experience, rats paradoxically replay the less-experienced trajectory. To understand why, we simulated a feedforward neural network with two regimes: rich learning (structured representations tailored to task demands) and lazy learning (unstructured, task-agnostic representations). Rich, but not lazy, representations degraded following unbalanced experience, an effect that could be reversed with paradoxical replay. To test if this computational principle can account for the experimental data, we examined the relationship between paradoxical replay and learned task representations in the rat hippocampus. Strikingly, we found a strong association between the richness of learned task representations and the paradoxicality of replay. Taken together, these results suggest that paradoxical replay specifically serves to protect rich representations from the destructive effects of unbalanced experience, and more generally demonstrate a novel interaction between the nature of task representations and the function of replay in artificial and biological systems.
Collapse
Affiliation(s)
- Hung-Tu Chen
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, NH 03755
| | | |
Collapse
|
7
|
Haridi S, Wu CM, Dasgupta I, Schulz E. The scaling of mental computation in a sorting task. Cognition 2023; 241:105605. [PMID: 37748248 DOI: 10.1016/j.cognition.2023.105605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/27/2023]
Abstract
Many cognitive models provide valuable insights into human behavior. Yet the algorithmic complexity of candidate models can fail to capture how human reaction times scale with increasing input complexity. In the current work, we investigate the algorithms underlying human cognitive processes. Computer science characterizes algorithms by their time and space complexity scaling with problem size. We propose to use participants' reaction times to study how human computations scale with increasing input complexity. We tested this approach in a task where participants had to sort sequences of rectangles by their size. Our results showed that reaction times scaled close to linearly with sequence length and that participants learned and actively used latent structure whenever it was provided. This behavior was in line with a computational model that used the observed sequences to form hypotheses about the latent structures, searching through candidate hypotheses in a directed fashion. These results enrich our understanding of plausible cognitive models for efficient mental sorting and pave the way for future studies using reaction times to investigate the scaling of mental computations across psychological domains.
Collapse
Affiliation(s)
- Susanne Haridi
- Max Planck Institute for Biological Cybernetics, Germany; Max Planck School of Cognition, Germany.
| | | | - Ishita Dasgupta
- Princeton University, Department of Computer Science, United States of America
| | - Eric Schulz
- Max Planck Institute for Biological Cybernetics, Germany
| |
Collapse
|
8
|
Crivelli-Decker J, Clarke A, Park SA, Huffman DJ, Boorman ED, Ranganath C. Goal-oriented representations in the human hippocampus during planning and navigation. Nat Commun 2023; 14:2946. [PMID: 37221176 PMCID: PMC10206082 DOI: 10.1038/s41467-023-35967-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 01/10/2023] [Indexed: 05/25/2023] Open
Abstract
Recent work in cognitive and systems neuroscience has suggested that the hippocampus might support planning, imagination, and navigation by forming cognitive maps that capture the abstract structure of physical spaces, tasks, and situations. Navigation involves disambiguating similar contexts, and the planning and execution of a sequence of decisions to reach a goal. Here, we examine hippocampal activity patterns in humans during a goal-directed navigation task to investigate how contextual and goal information are incorporated in the construction and execution of navigational plans. During planning, hippocampal pattern similarity is enhanced across routes that share a context and a goal. During navigation, we observe prospective activation in the hippocampus that reflects the retrieval of pattern information related to a key-decision point. These results suggest that, rather than simply representing overlapping associations or state transitions, hippocampal activity patterns are shaped by context and goals.
Collapse
Affiliation(s)
- Jordan Crivelli-Decker
- Center for Neuroscience, University of California, Davis, CA, USA.
- Department of Psychology, University of California, Davis, CA, USA.
| | - Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Seongmin A Park
- Center for Neuroscience, University of California, Davis, CA, USA
- Center for Mind and Brain, University of California, Davis, CA, USA
| | - Derek J Huffman
- Center for Neuroscience, University of California, Davis, CA, USA
- Department of Psychology, Colby College, Waterville, ME, USA
| | - Erie D Boorman
- Center for Neuroscience, University of California, Davis, CA, USA
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Charan Ranganath
- Center for Neuroscience, University of California, Davis, CA, USA
- Department of Psychology, University of California, Davis, CA, USA
| |
Collapse
|
9
|
McFadyen J, Dolan RJ. Spatiotemporal Precision of Neuroimaging in Psychiatry. Biol Psychiatry 2023; 93:671-680. [PMID: 36376110 DOI: 10.1016/j.biopsych.2022.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/20/2022] [Accepted: 08/12/2022] [Indexed: 12/23/2022]
Abstract
Aberrant patterns of cognition, perception, and behavior seen in psychiatric disorders are thought to be driven by a complex interplay of neural processes that evolve at a rapid temporal scale. Understanding these dynamic processes in vivo in humans has been hampered by a trade-off between spatial and temporal resolutions inherent to current neuroimaging technology. A recent trend in psychiatric research has been the use of high temporal resolution imaging, particularly magnetoencephalography, often in conjunction with sophisticated machine learning decoding techniques. Developments here promise novel insights into the spatiotemporal dynamics of cognitive phenomena, including domains relevant to psychiatric illnesses such as reward and avoidance learning, memory, and planning. This review considers recent advances afforded by exploiting this increased spatiotemporal precision, with specific reference to applications that seek to drive a mechanistic understanding of psychopathology and the realization of preclinical translation.
Collapse
Affiliation(s)
- Jessica McFadyen
- UCL Max Planck Centre for Computational Psychiatry and Ageing Research and Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom; State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China.
| | - Raymond J Dolan
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| |
Collapse
|
10
|
McFadyen J, Liu Y, Dolan RJ. Differential replay of reward and punishment paths predicts approach and avoidance. Nat Neurosci 2023; 26:627-637. [PMID: 37020116 DOI: 10.1038/s41593-023-01287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 02/16/2023] [Indexed: 04/07/2023]
Abstract
Neural replay is implicated in planning, where states relevant to a task goal are rapidly reactivated in sequence. It remains unclear whether, during planning, replay relates to an actual prospective choice. Here, using magnetoencephalography (MEG), we studied replay in human participants while they planned to either approach or avoid an uncertain environment containing paths leading to reward or punishment. We find evidence for forward sequential replay during planning, with rapid state-to-state transitions from 20 to 90 ms. Replay of rewarding paths was boosted, relative to aversive paths, before a decision to avoid and attenuated before a decision to approach. A trial-by-trial bias toward replaying prospective punishing paths predicted irrational decisions to approach riskier environments, an effect more pronounced in participants with higher trait anxiety. The findings indicate a coupling of replay with planned behavior, where replay prioritizes an online representation of a worst-case scenario for approaching or avoiding.
Collapse
Affiliation(s)
- Jessica McFadyen
- The UCL Max Planck Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
- Wellcome Centre for Human Neuroimaging, University College London, London, UK.
| | - Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Raymond J Dolan
- The UCL Max Planck Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| |
Collapse
|
11
|
Sharp PB, Dolan RJ, Eldar E. Disrupted state transition learning as a computational marker of compulsivity. Psychol Med 2023; 53:2095-2105. [PMID: 37310326 PMCID: PMC10106291 DOI: 10.1017/s0033291721003846] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 08/28/2021] [Accepted: 09/02/2021] [Indexed: 11/07/2022]
Abstract
BACKGROUND Disorders involving compulsivity, fear, and anxiety are linked to beliefs that the world is less predictable. We lack a mechanistic explanation for how such beliefs arise. Here, we test a hypothesis that in people with compulsivity, fear, and anxiety, learning a probabilistic mapping between actions and environmental states is compromised. METHODS In Study 1 (n = 174), we designed a novel online task that isolated state transition learning from other facets of learning and planning. To determine whether this impairment is due to learning that is too fast or too slow, we estimated state transition learning rates by fitting computational models to two independent datasets, which tested learning in environments in which state transitions were either stable (Study 2: n = 1413) or changing (Study 3: n = 192). RESULTS Study 1 established that individuals with higher levels of compulsivity are more likely to demonstrate an impairment in state transition learning. Preliminary evidence here linked this impairment to a common factor comprising compulsivity and fear. Studies 2 and 3 showed that compulsivity is associated with learning that is too fast when it should be slow (i.e. when state transition are stable) and too slow when it should be fast (i.e. when state transitions change). CONCLUSIONS Together, these findings indicate that compulsivity is associated with a dysregulation of state transition learning, wherein the rate of learning is not well adapted to the task environment. Thus, dysregulated state transition learning might provide a key target for therapeutic intervention in compulsivity.
Collapse
Affiliation(s)
- Paul B. Sharp
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- The Hebrew University of Jerusalem, Jerusalem, IL, USA
| | - Raymond J. Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Eran Eldar
- The Hebrew University of Jerusalem, Jerusalem, IL, USA
| |
Collapse
|
12
|
Kurth-Nelson Z, Behrens T, Wayne G, Miller K, Luettgau L, Dolan R, Liu Y, Schwartenbeck P. Replay and compositional computation. Neuron 2023; 111:454-469. [PMID: 36640765 DOI: 10.1016/j.neuron.2022.12.028] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 08/11/2022] [Accepted: 12/18/2022] [Indexed: 01/15/2023]
Abstract
Replay in the brain has been viewed as rehearsal or, more recently, as sampling from a transition model. Here, we propose a new hypothesis: that replay is able to implement a form of compositional computation where entities are assembled into relationally bound structures to derive qualitatively new knowledge. This idea builds on recent advances in neuroscience, which indicate that the hippocampus flexibly binds objects to generalizable roles and that replay strings these role-bound objects into compound statements. We suggest experiments to test our hypothesis, and we end by noting the implications for AI systems which lack the human ability to radically generalize past experience to solve new problems.
Collapse
Affiliation(s)
- Zeb Kurth-Nelson
- DeepMind, London, UK; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, London, UK.
| | - Timothy Behrens
- Wellcome Centre for Human Neuroimaging, University College London, London, UK; Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
| | | | - Kevin Miller
- DeepMind, London, UK; Institute of Ophthalmology, University College London, London, UK
| | - Lennart Luettgau
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, London, UK
| | - Ray Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, London, UK; Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China; Chinese Institute for Brain Research, Beijing, China
| | - Philipp Schwartenbeck
- Max Planck Institute for Biological Cybernetics, Tubingen, Germany; University of Tubingen, Tubingen, Germany
| |
Collapse
|
13
|
Wimmer GE, Liu Y, McNamee DC, Dolan RJ. Distinct replay signatures for prospective decision-making and memory preservation. Proc Natl Acad Sci U S A 2023; 120:e2205211120. [PMID: 36719914 PMCID: PMC9963918 DOI: 10.1073/pnas.2205211120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 12/05/2022] [Indexed: 02/01/2023] Open
Abstract
Theories of neural replay propose that it supports a range of functions, most prominently planning and memory consolidation. Here, we test the hypothesis that distinct signatures of replay in the same task are related to model-based decision-making ("planning") and memory preservation. We designed a reward learning task wherein participants utilized structure knowledge for model-based evaluation, while at the same time had to maintain knowledge of two independent and randomly alternating task environments. Using magnetoencephalography and multivariate analysis, we first identified temporally compressed sequential reactivation, or replay, both prior to choice and following reward feedback. Before choice, prospective replay strength was enhanced for the current task-relevant environment when a model-based planning strategy was beneficial. Following reward receipt, and consistent with a memory preservation role, replay for the alternative distal task environment was enhanced as a function of decreasing recency of experience with that environment. Critically, these planning and memory preservation relationships were selective to pre-choice and post-feedback periods, respectively. Our results provide support for key theoretical proposals regarding the functional role of replay and demonstrate that the relative strength of planning and memory-related signals are modulated by ongoing computational and task demands.
Collapse
Affiliation(s)
- G. Elliott Wimmer
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, LondonWC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3BG, UK
| | - Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing100875, China
- Chinese Institute for Brain Research, Beijing100875, China
| | - Daniel C. McNamee
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, LondonWC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3BG, UK
- Neuroscience Programme, Champalimaud Research, Lisbon1400-038, Portugal
| | - Raymond J. Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, LondonWC1B 5EH, UK
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3BG, UK
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing100875, China
| |
Collapse
|
14
|
Emanuel A, Eldar E. Emotions as computations. Neurosci Biobehav Rev 2023; 144:104977. [PMID: 36435390 PMCID: PMC9805532 DOI: 10.1016/j.neubiorev.2022.104977] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/26/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022]
Abstract
Emotions ubiquitously impact action, learning, and perception, yet their essence and role remain widely debated. Computational accounts of emotion aspire to answer these questions with greater conceptual precision informed by normative principles and neurobiological data. We examine recent progress in this regard and find that emotions may implement three classes of computations, which serve to evaluate states, actions, and uncertain prospects. For each of these, we use the formalism of reinforcement learning to offer a new formulation that better accounts for existing evidence. We then consider how these distinct computations may map onto distinct emotions and moods. Integrating extensive research on the causes and consequences of different emotions suggests a parsimonious one-to-one mapping, according to which emotions are integral to how we evaluate outcomes (pleasure & pain), learn to predict them (happiness & sadness), use them to inform our (frustration & content) and others' (anger & gratitude) actions, and plan in order to realize (desire & hope) or avoid (fear & anxiety) uncertain outcomes.
Collapse
Affiliation(s)
- Aviv Emanuel
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| | - Eran Eldar
- Department of Psychology, Hebrew University of Jerusalem, Jerusalem 9190501, Israel; Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| |
Collapse
|
15
|
Training diversity promotes absolute-value-guided choice. PLoS Comput Biol 2022; 18:e1010664. [DOI: 10.1371/journal.pcbi.1010664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 11/21/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
Many decision-making studies have demonstrated that humans learn either expected values or relative preferences among choice options, yet little is known about what environmental conditions promote one strategy over the other. Here, we test the novel hypothesis that humans adapt the degree to which they form absolute values to the diversity of the learning environment. Since absolute values generalize better to new sets of options, we predicted that the more options a person learns about the more likely they would be to form absolute values. To test this, we designed a multi-day learning experiment comprising twenty learning sessions in which subjects chose among pairs of images each associated with a different probability of reward. We assessed the degree to which subjects formed absolute values and relative preferences by asking them to choose between images they learned about in separate sessions. We found that concurrently learning about more images within a session enhanced absolute-value, and suppressed relative-preference, learning. Conversely, cumulatively pitting each image against a larger number of other images across multiple sessions did not impact the form of learning. These results show that the way humans encode preferences is adapted to the diversity of experiences offered by the immediate learning context.
Collapse
|
16
|
Colas JT, Dundon NM, Gerraty RT, Saragosa‐Harris NM, Szymula KP, Tanwisuth K, Tyszka JM, van Geen C, Ju H, Toga AW, Gold JI, Bassett DS, Hartley CA, Shohamy D, Grafton ST, O'Doherty JP. Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Hum Brain Mapp 2022; 43:4750-4790. [PMID: 35860954 PMCID: PMC9491297 DOI: 10.1002/hbm.25988] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/20/2022] [Accepted: 06/10/2022] [Indexed: 11/12/2022] Open
Abstract
The model-free algorithms of "reinforcement learning" (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This "generalized reinforcement learning" (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| | - Neil M. Dundon
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
- Department of Child and Adolescent Psychiatry, Psychotherapy, and PsychosomaticsUniversity of FreiburgFreiburg im BreisgauGermany
| | - Raphael T. Gerraty
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Center for Science and SocietyColumbia UniversityNew YorkNew YorkUSA
| | - Natalie M. Saragosa‐Harris
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of CaliforniaLos AngelesCaliforniaUSA
| | - Karol P. Szymula
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Koranis Tanwisuth
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Department of PsychologyUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - J. Michael Tyszka
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Camilla van Geen
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Department of PsychologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Harang Ju
- Neuroscience Graduate GroupUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arthur W. Toga
- Laboratory of Neuro ImagingUSC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Joshua I. Gold
- Department of NeuroscienceUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dani S. Bassett
- Department of BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Electrical and Systems EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of Physics and AstronomyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Santa Fe InstituteSanta FeNew MexicoUSA
| | - Catherine A. Hartley
- Department of PsychologyNew York UniversityNew YorkNew YorkUSA
- Center for Neural ScienceNew York UniversityNew YorkNew YorkUSA
| | - Daphna Shohamy
- Department of PsychologyColumbia UniversityNew YorkNew YorkUSA
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkNew YorkUSA
- Kavli Institute for Brain ScienceColumbia UniversityNew YorkNew YorkUSA
| | - Scott T. Grafton
- Department of Psychological and Brain SciencesUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - John P. O'Doherty
- Division of the Humanities and Social SciencesCalifornia Institute of TechnologyPasadenaCaliforniaUSA
- Computation and Neural Systems Program, California Institute of TechnologyPasadenaCaliforniaUSA
| |
Collapse
|
17
|
Rajalingham R, Piccato A, Jazayeri M. Recurrent neural networks with explicit representation of dynamic latent variables can mimic behavioral patterns in a physical inference task. Nat Commun 2022; 13:5865. [PMID: 36195614 PMCID: PMC9532407 DOI: 10.1038/s41467-022-33581-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 09/22/2022] [Indexed: 11/09/2022] Open
Abstract
Primates can richly parse sensory inputs to infer latent information. This ability is hypothesized to rely on establishing mental models of the external world and running mental simulations of those models. However, evidence supporting this hypothesis is limited to behavioral models that do not emulate neural computations. Here, we test this hypothesis by directly comparing the behavior of primates (humans and monkeys) in a ball interception task to that of a large set of recurrent neural network (RNN) models with or without the capacity to dynamically track the underlying latent variables. Humans and monkeys exhibit similar behavioral patterns. This primate behavioral pattern is best captured by RNNs endowed with dynamic inference, consistent with the hypothesis that the primate brain uses dynamic inferences to support flexible physical predictions. Moreover, our work highlights a general strategy for using model neural systems to test computational hypotheses of higher brain function.
Collapse
Affiliation(s)
- Rishi Rajalingham
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Building 46, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Aída Piccato
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Building 46, 43 Vassar St., Cambridge, MA, 02139, USA.,Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Building 46, 43 Vassar St., Cambridge, MA, 02139-4307, USA
| | - Mehrdad Jazayeri
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Building 46, 43 Vassar St., Cambridge, MA, 02139, USA. .,Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Building 46, 43 Vassar St., Cambridge, MA, 02139-4307, USA.
| |
Collapse
|
18
|
Zhu S, Lakshminarasimhan KJ, Arfaei N, Angelaki DE. Eye movements reveal spatiotemporal dynamics of visually-informed planning in navigation. eLife 2022; 11:e73097. [PMID: 35503099 PMCID: PMC9135400 DOI: 10.7554/elife.73097] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 05/01/2022] [Indexed: 11/28/2022] Open
Abstract
Goal-oriented navigation is widely understood to depend upon internal maps. Although this may be the case in many settings, humans tend to rely on vision in complex, unfamiliar environments. To study the nature of gaze during visually-guided navigation, we tasked humans to navigate to transiently visible goals in virtual mazes of varying levels of difficulty, observing that they took near-optimal trajectories in all arenas. By analyzing participants' eye movements, we gained insights into how they performed visually-informed planning. The spatial distribution of gaze revealed that environmental complexity mediated a striking trade-off in the extent to which attention was directed towards two complimentary aspects of the world model: the reward location and task-relevant transitions. The temporal evolution of gaze revealed rapid, sequential prospection of the future path, evocative of neural replay. These findings suggest that the spatiotemporal characteristics of gaze during navigation are significantly shaped by the unique cognitive computations underlying real-world, sequential decision making.
Collapse
Affiliation(s)
- Seren Zhu
- Center for Neural Science, New York UniversityNew YorkUnited States
| | | | - Nastaran Arfaei
- Department of Psychology, New York UniversityNew YorkUnited States
| | - Dora E Angelaki
- Center for Neural Science, New York UniversityNew YorkUnited States
- Department of Mechanical and Aerospace Engineering, New York UniversityNew YorkUnited States
| |
Collapse
|
19
|
Abstract
In human neuroscience, studies of cognition are rarely grounded in non-task-evoked, 'spontaneous' neural activity. Indeed, studies of spontaneous activity tend to focus predominantly on intrinsic neural patterns (for example, resting-state networks). Taking a 'representation-rich' approach bridges the gap between cognition and resting-state communities: this approach relies on decoding task-related representations from spontaneous neural activity, allowing quantification of the representational content and rich dynamics of such activity. For example, if we know the neural representation of an episodic memory, we can decode its subsequent replay during rest. We argue that such an approach advances cognitive research beyond a focus on immediate task demand and provides insight into the functional relevance of the intrinsic neural pattern (for example, the default mode network). This in turn enables a greater integration between human and animal neuroscience, facilitating experimental testing of theoretical accounts of intrinsic activity, and opening new avenues of research in psychiatry.
Collapse
|
20
|
Widloski J, Foster DJ. Flexible rerouting of hippocampal replay sequences around changing barriers in the absence of global place field remapping. Neuron 2022; 110:1547-1558.e8. [PMID: 35180390 PMCID: PMC9473153 DOI: 10.1016/j.neuron.2022.02.002] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/30/2021] [Accepted: 02/01/2022] [Indexed: 01/12/2023]
Abstract
Flexibility is a hallmark of memories that depend on the hippocampus. For navigating animals, flexibility is necessitated by environmental changes such as blocked paths and extinguished food sources. To better understand the neural basis of this flexibility, we recorded hippocampal replays in a spatial memory task where barriers as well as goals were moved between sessions to see whether replays could adapt to new spatial and reward contingencies. Strikingly, replays consistently depicted new goal-directed trajectories around each new barrier configuration and largely avoided barrier violations. Barrier-respecting replays were learned rapidly and did not rely on place cell remapping. These data distinguish sharply between place field responses, which were largely stable and remained tied to sensory cues, and replays, which changed flexibly to reflect the learned contingencies in the environment and suggest sequenced activations such as replay to be an important link between the hippocampus and flexible memory.
Collapse
Affiliation(s)
- John Widloski
- Helen Wills Neuroscience Institute and Department of Psychology, University of California, Berkeley, CA 94720, USA
| | - David J Foster
- Helen Wills Neuroscience Institute and Department of Psychology, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|
21
|
Model-based learning retrospectively updates model-free values. Sci Rep 2022; 12:2358. [PMID: 35149713 PMCID: PMC8837618 DOI: 10.1038/s41598-022-05567-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 12/16/2021] [Indexed: 12/02/2022] Open
Abstract
Reinforcement learning (RL) is widely regarded as divisible into two distinct computational strategies. Model-free learning is a simple RL process in which a value is associated with actions, whereas model-based learning relies on the formation of internal models of the environment to maximise reward. Recently, theoretical and animal work has suggested that such models might be used to train model-free behaviour, reducing the burden of costly forward planning. Here we devised a way to probe this possibility in human behaviour. We adapted a two-stage decision task and found evidence that model-based processes at the time of learning can alter model-free valuation in healthy individuals. We asked people to rate subjective value of an irrelevant feature that was seen at the time a model-based decision would have been made. These irrelevant feature value ratings were updated by rewards, but in a way that accounted for whether the selected action retrospectively ought to have been taken. This model-based influence on model-free value ratings was best accounted for by a reward prediction error that was calculated relative to the decision path that would most likely have led to the reward. This effect occurred independently of attention and was not present when participants were not explicitly told about the structure of the environment. These findings suggest that current conceptions of model-based and model-free learning require updating in favour of a more integrated approach. Our task provides an empirical handle for further study of the dialogue between these two learning systems in the future.
Collapse
|
22
|
Optimism and pessimism in optimised replay. PLoS Comput Biol 2022; 18:e1009634. [PMID: 35020718 PMCID: PMC8809607 DOI: 10.1371/journal.pcbi.1009634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 02/02/2022] [Accepted: 11/12/2021] [Indexed: 11/24/2022] Open
Abstract
The replay of task-relevant trajectories is known to contribute to memory consolidation and improved task performance. A wide variety of experimental data show that the content of replayed sequences is highly specific and can be modulated by reward as well as other prominent task variables. However, the rules governing the choice of sequences to be replayed still remain poorly understood. One recent theoretical suggestion is that the prioritization of replay experiences in decision-making problems is based on their effect on the choice of action. We show that this implies that subjects should replay sub-optimal actions that they dysfunctionally choose rather than optimal ones, when, by being forgetful, they experience large amounts of uncertainty in their internal models of the world. We use this to account for recent experimental data demonstrating exactly pessimal replay, fitting model parameters to the individual subjects’ choices. When animals are asleep or restfully awake, populations of neurons in their brains recapitulate activity associated with extended behaviourally-relevant experiences. This process is called replay, and it has been established for a long time in rodents, and very recently in humans, to be important for good performance in decision-making tasks. The specific experiences which are replayed during those epochs follow highly ordered patterns, but the mechanisms which establish their priority are still not fully understood. One promising theoretical suggestion is that each replay experience is chosen in such a way that the learning that ensues is most helpful for the subsequent performance of the animal. A very recent study reported a surprising result that humans who achieved high performance in a planning task tended to replay actions they found to be sub-optimal, and that this was associated with a useful deprecation of those actions in subsequent performance. In this study, we examine the nature of this pessimized form of replay and show that it is exactly appropriate for forgetful agents. We analyse the role of forgetting for replay choices of our model, and verify our predictions using human subject data.
Collapse
|
23
|
Surget A, Belzung C. Adult hippocampal neurogenesis shapes adaptation and improves stress response: a mechanistic and integrative perspective. Mol Psychiatry 2022; 27:403-421. [PMID: 33990771 PMCID: PMC8960391 DOI: 10.1038/s41380-021-01136-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 04/09/2021] [Accepted: 04/19/2021] [Indexed: 02/03/2023]
Abstract
Adult hippocampal neurogenesis (AHN) represents a remarkable form of neuroplasticity that has increasingly been linked to the stress response in recent years. However, the hippocampus does not itself support the expression of the different dimensions of the stress response. Moreover, the main hippocampal functions are essentially preserved under AHN depletion and adult-born immature neurons (abGNs) have no extrahippocampal projections, which questions the mechanisms by which abGNs influence functions supported by brain areas far from the hippocampus. Within this framework, we propose that through its computational influences AHN is pivotal in shaping adaption to environmental demands, underlying its role in stress response. The hippocampus with its high input convergence and output divergence represents a computational hub, ideally positioned in the brain (1) to detect cues and contexts linked to past, current and predicted stressful experiences, and (2) to supervise the expression of the stress response at the cognitive, affective, behavioral, and physiological levels. AHN appears to bias hippocampal computations toward enhanced conjunctive encoding and pattern separation, promoting contextual discrimination and cognitive flexibility, reducing proactive interference and generalization of stressful experiences to safe contexts. These effects result in gating downstream brain areas with more accurate and contextualized information, enabling the different dimensions of the stress response to be more appropriately set with specific contexts. Here, we first provide an integrative perspective of the functional involvement of AHN in the hippocampus and a phenomenological overview of the stress response. We then examine the mechanistic underpinning of the role of AHN in the stress response and describe its potential implications in the different dimensions accompanying this response.
Collapse
Affiliation(s)
- A Surget
- UMR 1253, iBrain, Université de Tours, Inserm, Tours, France.
| | - C Belzung
- UMR 1253, iBrain, Université de Tours, Inserm, Tours, France.
| |
Collapse
|
24
|
Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. eLife 2021; 10:e67778. [PMID: 34882092 PMCID: PMC8758138 DOI: 10.7554/elife.67778] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 12/08/2021] [Indexed: 11/13/2022] Open
Abstract
Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlight a novel DA influence on MB-MF cooperative interactions.
Collapse
Affiliation(s)
- Lorenz Deserno
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Child and Adolescent Psychiatry, Psychotherapy and Psychosomatics, University of WürzburgWürzburgGermany
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Rani Moran
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| | - Jochen Michely
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin BerlinBerlinGermany
| | - Ying Lee
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
- Department of Psychiatry and Psychotherapy, Technische Universität DresdenDresdenGermany
| | - Peter Dayan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- Max Planck Institute for Biological CyberneticsTübingenGermany
- University of TübingenTübingenGermany
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College LondonLondonUnited Kingdom
- The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College LondonLondonUnited Kingdom
| |
Collapse
|
25
|
Yang AI, Dikecligil GN, Jiang H, Das SR, Stein JM, Schuele SU, Rosenow JM, Davis KA, Lucas TH, Gottfried JA. The what and when of olfactory working memory in humans. Curr Biol 2021; 31:4499-4511.e8. [PMID: 34450088 DOI: 10.1016/j.cub.2021.08.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 06/15/2021] [Accepted: 08/02/2021] [Indexed: 12/31/2022]
Abstract
Encoding and retaining novel sequences of sensory stimuli in working memory is crucial for adaptive behavior. A fundamental challenge for the central nervous system is to maintain each sequence item in an active and discriminable state, while also preserving their temporal context. Nested neural oscillations have been postulated to disambiguate the "what" and "when" of sequences, but the mechanisms by which these multiple streams of information are coordinated in the human brain remain unclear. Drawing from foundational animal studies, we recorded local field potentials from the human piriform cortex and hippocampus during a working memory task in which subjects experienced sequences of three distinct odors. Our data revealed a unique organization of odor memories across multiple timescales of the theta rhythm. During encoding, odors elicited greater gamma at distinct theta phases in both regions, time stamping their positions in the sequence, whereby the robustness of this effect was predictive of temporal order memory. During maintenance, stimulus-driven patterns of theta-coupled gamma were spontaneously reinstated in piriform cortex, recapitulating the order of the initial sequence. Replay events were time compressed across contiguous theta cycles, coinciding with periods of enhanced piriform-hippocampal theta-phase synchrony, and their prevalence forecasted subsequent recall accuracy on a trial-by-trial basis. Our data provide a novel link between endogenous replay orchestrated by the theta rhythm and short-term retention of sequential memories in the human brain.
Collapse
Affiliation(s)
- Andrew I Yang
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Gulce N Dikecligil
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Heidi Jiang
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Sandhitsu R Das
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Joel M Stein
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Stephan U Schuele
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Joshua M Rosenow
- Department of Neurosurgery, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Kathryn A Davis
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Timothy H Lucas
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jay A Gottfried
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
26
|
Wang S, Feng SF, Bornstein AM. Mixing memory and desire: How memory reactivation supports deliberative decision-making. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1581. [PMID: 34665529 DOI: 10.1002/wcs.1581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 08/24/2021] [Accepted: 09/16/2021] [Indexed: 11/09/2022]
Abstract
Memories affect nearly every aspect of our mental life. They allow us to both resolve uncertainty in the present and to construct plans for the future. Recently, renewed interest in the role memory plays in adaptive behavior has led to new theoretical advances and empirical observations. We review key findings, with particular emphasis on how the retrieval of many kinds of memories affects deliberative action selection. These results are interpreted in a sequential inference framework, in which reinstatements from memory serve as "samples" of potential action outcomes. The resulting model suggests a central role for the dynamics of memory reactivation in determining the influence of different kinds of memory in decisions. We propose that representation-specific dynamics can implement a bottom-up "product of experts" rule that integrates multiple sets of action-outcome predictions weighted based on their uncertainty. We close by reviewing related findings and identifying areas for further research. This article is categorized under: Psychology > Reasoning and Decision Making Neuroscience > Cognition Neuroscience > Computation.
Collapse
Affiliation(s)
- Shaoming Wang
- Department of Psychology, New York University, New York, New York, USA
| | - Samuel F Feng
- Department of Mathematics, Khalifa University of Science and Technology, Abu Dhabi, UAE.,Khalifa University Centre for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE
| | - Aaron M Bornstein
- Department of Cognitive Sciences, University of California-Irvine, Irvine, California, USA.,Center for the Neurobiology of Learning & Memory, University of California-Irvine, Irvine, California, USA.,Institute for Mathematical Behavioral Sciences, University of California-Irvine, Irvine, California, USA
| |
Collapse
|
27
|
Wittkuhn L, Chien S, Hall-McMaster S, Schuck NW. Replay in minds and machines. Neurosci Biobehav Rev 2021; 129:367-388. [PMID: 34371078 DOI: 10.1016/j.neubiorev.2021.08.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 07/19/2021] [Accepted: 08/01/2021] [Indexed: 11/19/2022]
Abstract
Experience-related brain activity patterns reactivate during sleep, wakeful rest, and brief pauses from active behavior. In parallel, machine learning research has found that experience replay can lead to substantial performance improvements in artificial agents. Together, these lines of research suggest replay has a variety of computational benefits for decision-making and learning. Here, we provide an overview of putative computational functions of replay as suggested by machine learning and neuroscientific research. We show that replay can lead to faster learning, less forgetting, reorganization or augmentation of experiences, and support planning and generalization. In addition, we highlight the benefits of reactivating abstracted internal representations rather than veridical memories, and discuss how replay could provide a mechanism to build internal representations that improve learning and decision-making.
Collapse
Affiliation(s)
- Lennart Wittkuhn
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany.
| | - Samson Chien
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany
| | - Sam Hall-McMaster
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany
| | - Nicolas W Schuck
- Max Planck Research Group NeuroCode, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany; Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Lentzeallee 94, D-14195 Berlin, Germany.
| |
Collapse
|
28
|
Wise T, Liu Y, Chowdhury F, Dolan RJ. Model-based aversive learning in humans is supported by preferential task state reactivation. SCIENCE ADVANCES 2021; 7:eabf9616. [PMID: 34321205 PMCID: PMC8318377 DOI: 10.1126/sciadv.abf9616] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 06/10/2021] [Indexed: 06/13/2023]
Abstract
Harm avoidance is critical for survival, yet little is known regarding the neural mechanisms supporting avoidance in the absence of trial-and-error experience. Flexible avoidance may be supported by a mental model (i.e., model-based), a process for which neural reactivation and sequential replay have emerged as candidate mechanisms. During an aversive learning task, combined with magnetoencephalography, we show prospective and retrospective reactivation during planning and learning, respectively, coupled to evidence for sequential replay. Specifically, when individuals plan in an aversive context, we find preferential reactivation of subsequently chosen goal states. Stronger reactivation is associated with greater hippocampal theta power. At outcome receipt, unchosen goal states are reactivated regardless of outcome valence. Replay of paths leading to goal states was modulated by outcome valence, with aversive outcomes associated with stronger reverse replay than safe outcomes. Our findings are suggestive of avoidance involving simulation of unexperienced states through hippocampally mediated reactivation and replay.
Collapse
Affiliation(s)
- Toby Wise
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
| | - Fatima Chowdhury
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- Queen Square MS Centre, Department of Neuroinflammation, UCL Queen Square Institute of Neurology, London, UK
| | - Raymond J Dolan
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| |
Collapse
|
29
|
Liu Y, Dolan RJ, Higgins C, Penagos H, Woolrich MW, Ólafsdóttir HF, Barry C, Kurth-Nelson Z, Behrens TE. Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife 2021; 10:e66917. [PMID: 34096501 PMCID: PMC8318595 DOI: 10.7554/elife.66917] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 06/06/2021] [Indexed: 12/25/2022] Open
Abstract
There are rich structures in off-task neural activity which are hypothesized to reflect fundamental computations across a broad spectrum of cognitive functions. Here, we develop an analysis toolkit - temporal delayed linear modelling (TDLM) - for analysing such activity. TDLM is a domain-general method for finding neural sequences that respect a pre-specified transition graph. It combines nonlinear classification and linear temporal modelling to test for statistical regularities in sequences of task-related reactivations. TDLM is developed on the non-invasive neuroimaging data and is designed to take care of confounds and maximize sequence detection ability. Notably, as a linear framework, TDLM can be easily extended, without loss of generality, to capture rodent replay in electrophysiology, including in continuous spaces, as well as addressing second-order inference questions, for example, its temporal and spatial varying pattern. We hope TDLM will advance a deeper understanding of neural computation and promote a richer convergence between animal and human neuroscience.
Collapse
Affiliation(s)
- Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal UniversityBeijingChina
- Chinese Institute for Brain ResearchBeijingChina
- Max Planck University College London Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
| | - Raymond J Dolan
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal UniversityBeijingChina
- Max Planck University College London Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- Wellcome Centre for Human Neuroimaging, University College LondonLondonUnited Kingdom
| | - Cameron Higgins
- Wellcome Centre for Integrative Neuroimaging, University of OxfordOxfordUnited Kingdom
| | - Hector Penagos
- Center for Brains, Minds and Machines, Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of TechnologyCambridgeUnited States
| | - Mark W Woolrich
- Wellcome Centre for Integrative Neuroimaging, University of OxfordOxfordUnited Kingdom
| | - H Freyja Ólafsdóttir
- Donders Institute for Brain Cognition and Behaviour, Radboud UniversityNijmegenNetherlands
| | - Caswell Barry
- Research Department of Cell and Developmental Biology, University College LondonLondonUnited Kingdom
| | - Zeb Kurth-Nelson
- Max Planck University College London Centre for Computational Psychiatry and Ageing ResearchLondonUnited Kingdom
- DeepMindLondonUnited Kingdom
| | - Timothy E Behrens
- Wellcome Centre for Human Neuroimaging, University College LondonLondonUnited Kingdom
- Wellcome Centre for Integrative Neuroimaging, University of OxfordOxfordUnited Kingdom
| |
Collapse
|
30
|
Buch ER, Claudino L, Quentin R, Bönstrup M, Cohen LG. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep 2021; 35:109193. [PMID: 34107255 PMCID: PMC8259719 DOI: 10.1016/j.celrep.2021.109193] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/10/2021] [Accepted: 05/09/2021] [Indexed: 01/05/2023] Open
Abstract
The introduction of rest intervals interspersed with practice strengthens wakeful consolidation of skill. The mechanisms by which the brain binds discrete action representations into consolidated, highly temporally resolved skill sequences during waking rest are not known. To address this question, we recorded magnetoencephalography (MEG) during acquisition and rapid consolidation of a sequential motor skill. We report the presence of prominent, fast waking neural replay during the same rest periods in which rapid consolidation occurs. The observed replay is temporally compressed by approximately 20-fold relative to the acquired skill, is selective for the trained sequence, and predicts the magnitude of skill consolidation. Replay representations extend beyond the hippocampus and entorhinal cortex to the contralateral sensorimotor cortex. These results document the presence of robust hippocampo-neocortical replay supporting rapid wakeful consolidation of skill.
Collapse
Affiliation(s)
- Ethan R Buch
- Human Cortical Physiology and Neurorehabilitation Section, NINDS, NIH, Bethesda, MD, USA.
| | - Leonardo Claudino
- Human Cortical Physiology and Neurorehabilitation Section, NINDS, NIH, Bethesda, MD, USA
| | - Romain Quentin
- Human Cortical Physiology and Neurorehabilitation Section, NINDS, NIH, Bethesda, MD, USA
| | - Marlene Bönstrup
- Human Cortical Physiology and Neurorehabilitation Section, NINDS, NIH, Bethesda, MD, USA
| | - Leonardo G Cohen
- Human Cortical Physiology and Neurorehabilitation Section, NINDS, NIH, Bethesda, MD, USA.
| |
Collapse
|
31
|
Abstract
We use neural reinforcement learning concepts including Pavlovian versus instrumental control, liking versus wanting, model-based versus model-free control, online versus offline learning and planning, and internal versus external actions and control to reflect on putative conflicts between short-term temptations and long-term goals.
Collapse
|
32
|
Abstract
Experiments have implicated dopamine in model-based reinforcement learning (RL). These findings are unexpected as dopamine is thought to encode a reward prediction error (RPE), which is the key teaching signal in model-free RL. Here we examine two possible accounts for dopamine's involvement in model-based RL: the first that dopamine neurons carry a prediction error used to update a type of predictive state representation called a successor representation, the second that two well established aspects of dopaminergic activity, RPEs and surprise signals, can together explain dopamine's involvement in model-based RL.
Collapse
|
33
|
|
34
|
Abstract
Credit assignment (CA) to relevant actions poses a challenge because one is often flooded with reward feedback that is not easily causally attributed. We addressed this issue in a reinforcement learning framework wherein choice is mutually controlled by value-caching model-free (MF) and prospective, planning model-based (MB) systems. We find knowledge, stored in a cognitive map, filters exuberant reward feedback to guide CA in both systems but based on different attribute dimensions. In MF, CA is boosted for outcomes that are relevant (causally related) to one’s choice, whereas in MB, CA is enhanced for outcomes that attract greater attention during the deliberation process that preceded a choice. We consider normative and mechanistic accounts, including how these processes are instrumental to adaptation. An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively. Considerable attention has been paid to how these systems interact during choice, but how and whether knowledge of a cognitive map contributes to the way MF and MB controllers assign credit (i.e., to how they revaluate actions and states following the receipt of an outcome) remains underexplored. Here, we examine such sophisticated credit assignment using a dual-outcome bandit task. We provide evidence that knowledge of a cognitive map influences credit assignment in both MF and MB systems, mediating subtly different aspects of apparent relevance. Specifically, we show MF credit assignment is enhanced for those rewards that are related to a choice, and this contrasted with choice-unrelated rewards that reinforced subsequent choices negatively. This modulation is only possible based on knowledge of task structure. On the other hand, MB credit assignment was boosted for outcomes that impacted on differences in values between offered bandits. We consider mechanistic accounts and the normative status of these findings. We suggest the findings extend the scope and sophistication of cognitive map-based credit assignment during reinforcement learning, with implications for understanding behavioral control.
Collapse
|