1
|
Yang Q, Zhu Z, Si R, Li Y, Zhang J, Yang T. A language model of problem solving in humans and macaque monkeys. Curr Biol 2025; 35:11-20.e10. [PMID: 39631400 DOI: 10.1016/j.cub.2024.10.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 09/30/2024] [Accepted: 10/29/2024] [Indexed: 12/07/2024]
Abstract
Human intelligence is characterized by the remarkable ability to solve complex problems by planning a sequence of actions that takes us from an initial state to a desired goal state. Quantifying and comparing problem-solving capabilities across species and finding their evolutionary roots are critical for understanding how the brain carries out this intricate process. We introduce the Language of Problem Solving (LoPS) model as a novel quantitative framework that investigates the structure of problem-solving behavior through a language model. We applied the model to an adapted classic Pac-Man game as a cross-species behavioral paradigm to test both humans and macaque monkeys. The LoPS model extracted the latent structure, or grammar, embedded in the agents' gameplay, revealing the non-Markovian temporal dependency structure of their problem-solving behavior and the hierarchical structures of problem solving in both species. The complexity of LoPS grammar correlated with individuals' game performance and reflected the difference in problem-solving capacity between humans and monkeys. Both species evolved their LoPS grammars during learning, progressing from simpler to more complex ones, suggesting that the structure of problem solving is not fixed but evolves to support more sophisticated and efficient problem solving. Our study provides insights into how humans and monkeys break down problem solving into compositional units and navigate complex tasks, deepening our understanding of human intelligence and its evolution and establishing a foundation for future investigations of the neural mechanisms of problem solving.
Collapse
Affiliation(s)
- Qianli Yang
- Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Zhihua Zhu
- Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ruoguang Si
- Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University, Maindy Road, Cardiff CF24 4HQ, UK
| | - Yunwei Li
- Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 101408, China
| | - Jiaxiang Zhang
- School of Mathematics and Computer Science, Swansea University, Swansea SA1 8DD, UK
| | - Tianming Yang
- Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|
2
|
Schiewer R, Subramoney A, Wiskott L. Exploring the limits of hierarchical world models in reinforcement learning. Sci Rep 2024; 14:26856. [PMID: 39500969 PMCID: PMC11538428 DOI: 10.1038/s41598-024-76719-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Accepted: 10/16/2024] [Indexed: 11/08/2024] Open
Abstract
Hierarchical model-based reinforcement learning (HMBRL) aims to combine the sample efficiency of model-based reinforcement learning with the abstraction capability of hierarchical reinforcement learning. While HMBRL has great potential, the structural and conceptual complexities of current approaches make it challenging to extract general principles, hindering understanding and adaptation to new use cases, and thereby impeding the overall progress of the field. In this work we describe a novel HMBRL framework and evaluate it thoroughly. We construct hierarchical world models that simulate the environment at various levels of temporal abstraction. These models are used to train a stack of agents that communicate top-down by proposing goals to their subordinate agents. A significant focus of this study is the exploration of a static and environment agnostic temporal abstraction, which allows concurrent training of models and agents throughout the hierarchy. Unlike most goal-conditioned H(MB)RL approaches, it also leads to comparatively low dimensional abstract actions. Although our HMBRL approach did not outperform traditional methods in terms of final episode returns, it successfully facilitated decision-making across two levels of abstraction. A central challenge in enhancing our method's performance, as uncovered through comprehensive experimentation, is model exploitation on the abstract level of our world model stack. We provide an in depth examination of this issue, discussing its implications and suggesting directions for future research to overcome this challenge. By sharing these findings, we aim to contribute to the broader discourse on refining HMBRL methodologies.
Collapse
Affiliation(s)
- Robin Schiewer
- Department of Computer Science, Institute for Neural Computation, Ruhr-University Bochum, Bochum, 44787, Germany.
| | - Anand Subramoney
- Department of Computer Science, Royal Holloway University of London, London, TW20 0EX, UK
| | - Laurenz Wiskott
- Department of Computer Science, Institute for Neural Computation, Ruhr-University Bochum, Bochum, 44787, Germany
| |
Collapse
|
3
|
Russin J, Pavlick E, Frank MJ. CURRICULUM EFFECTS AND COMPOSITIONALITY EMERGE WITH IN-CONTEXT LEARNING IN NEURAL NETWORKS. ARXIV 2024:arXiv:2402.08674v3. [PMID: 38410645 PMCID: PMC10896373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are unstructured or randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evidence by positing two qualitatively different learning systems-one for rapid, rule-based inferences and another for slow, incremental adaptation. It remains unclear how to reconcile such theories with neural networks, which learn via incremental weight updates and are thus a natural model for the latter type of learning, but are not obviously compatible with the former. However, recent evidence suggests that both metalearning neural networks and large language models are capable of "in-context learning" (ICL)-the ability to flexibly grasp the structure of a new task from a few examples given at inference time. Here, we show that networks capable of ICL can reproduce human-like learning and compositional behavior on rule-governed tasks, while at the same time replicating human behavioral phenomena in tasks lacking rule-like structure via their usual in-weight learning (IWL). Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties than those traditionally attributed to them, and that these can coexist with the properties of their native IWL, thus offering a novel perspective on dual-process theories and human cognitive flexibility.
Collapse
Affiliation(s)
- Jacob Russin
- Department of Computer Science, Department of Cognitive and Psychological Sciences, Brown University
| | | | - Michael J Frank
- Department of Cognitive and Psychological Sciences, Carney Institute for Brain Science, Brown University
| |
Collapse
|
4
|
Wu CM, Dale R, Hawkins RD. Group Coordination Catalyzes Individual and Cultural Intelligence. Open Mind (Camb) 2024; 8:1037-1057. [PMID: 39229610 PMCID: PMC11370978 DOI: 10.1162/opmi_a_00155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 06/17/2024] [Indexed: 09/05/2024] Open
Abstract
A large program of research has aimed to ground large-scale cultural phenomena in processes taking place within individual minds. For example, investigating whether individual agents equipped with the right social learning strategies can enable cumulative cultural evolution given long enough time horizons. However, this approach often omits the critical group-level processes that mediate between individual agents and multi-generational societies. Here, we argue that interacting groups are a necessary and explanatory level of analysis, linking individual and collective intelligence through two characteristic feedback loops. In the first loop, more sophisticated individual-level social learning mechanisms based on Theory of Mind facilitate group-level complementarity, allowing distributed knowledge to be compositionally recombined in groups; these group-level innovations, in turn, ease the cognitive load on individuals. In the second loop, societal-level processes of cumulative culture provide groups with new cognitive technologies, including shared language and conceptual abstractions, which set in motion new group-level processes to further coordinate, recombine, and innovate. Taken together, these cycles establish group-level interaction as a dual engine of intelligence, catalyzing both individual cognition and cumulative culture.
Collapse
Affiliation(s)
- Charley M. Wu
- Human and Machine Cognition Lab, University of Tübingen, Tübingen, Germany
| | - Rick Dale
- Department of Communication, University of California, Los Angeles, Los Angeles, CA, USA
| | - Robert D. Hawkins
- Department of Psychology, University of Wisconsin–Madison, Madison, WI, USA
| |
Collapse
|
5
|
Zhu X. Temporally extended successor feature neural episodic control. Sci Rep 2024; 14:15103. [PMID: 38956201 PMCID: PMC11219751 DOI: 10.1038/s41598-024-65687-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 06/24/2024] [Indexed: 07/04/2024] Open
Abstract
One of the long-term goals of reinforcement learning is to build intelligent agents capable of rapidly learning and flexibly transferring skills, similar to humans and animals. In this paper, we introduce an episodic control framework based on the temporal expansion of subsequent features to achieve these goals, which we refer to as Temporally Extended Successor Feature Neural Episodic Control (TESFNEC). This method has shown impressive results in significantly improving sample efficiency and elegantly reusing previously learned strategies. Crucially, this model enhances agent training by incorporating episodic memory, significantly reducing the number of iterations required to learn the optimal policy. Furthermore, we adopt the temporal expansion of successor features a technique to capture the expected state transition dynamics of actions. This form of temporal abstraction does not entail learning a top-down hierarchy of task structures but focuses on the bottom-up combination of actions and action repetitions. Thus, our approach directly considers the temporal scope of sequences of temporally extended actions without requiring predefined or domain-specific options. Experimental results in the two-dimensional object collection environment demonstrate that the method proposed in this paper optimizes learning policies faster than baseline reinforcement learning approaches, leading to higher average returns.
Collapse
Affiliation(s)
- Xianchao Zhu
- Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education, Zhengzhou, 450001, China.
- Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou, 450001, China.
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, China.
| |
Collapse
|
6
|
Alejandro RJ, Holroyd CB. Hierarchical control over foraging behavior by anterior cingulate cortex. Neurosci Biobehav Rev 2024; 160:105623. [PMID: 38490499 DOI: 10.1016/j.neubiorev.2024.105623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/14/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024]
Abstract
Foraging is a natural behavior that involves making sequential decisions to maximize rewards while minimizing the costs incurred when doing so. The prevalence of foraging across species suggests that a common brain computation underlies its implementation. Although anterior cingulate cortex is believed to contribute to foraging behavior, its specific role has been contentious, with predominant theories arguing either that it encodes environmental value or choice difficulty. Additionally, recent attempts to characterize foraging have taken place within the reinforcement learning framework, with increasingly complex models scaling with task complexity. Here we review reinforcement learning foraging models, highlighting the hierarchical structure of many foraging problems. We extend this literature by proposing that ACC guides foraging according to principles of model-based hierarchical reinforcement learning. This idea holds that ACC function is organized hierarchically along a rostral-caudal gradient, with rostral structures monitoring the status and completion of high-level task goals (like finding food), and midcingulate structures overseeing the execution of task options (subgoals, like harvesting fruit) and lower-level actions (such as grabbing an apple).
Collapse
Affiliation(s)
| | - Clay B Holroyd
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
7
|
Wenzl P, Schultheis H. Action Selection in Everyday Activities: The Opportunistic Planning Model. Cogn Sci 2024; 48:e13444. [PMID: 38659094 DOI: 10.1111/cogs.13444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 02/23/2024] [Accepted: 04/02/2024] [Indexed: 04/26/2024]
Abstract
While action selection strategies in well-defined domains have received considerable attention, little is yet known about how people choose what to do next in ill-defined tasks. In this contribution, we shed light on this issue by considering everyday tasks, which in many cases have a multitude of possible solutions (e.g., it does not matter in which order the items are brought to the table when setting a table) and are thus categorized as ill-defined problems. Even if there are no hard constraints on the ordering of subtasks in everyday activities, our research shows that people exhibit specific preferences. We propose that these preferences arise from bounded rationality, that is, people only have limited knowledge and processing power available, which results in a preference to minimize the overall physical and cognitive effort. In the context of everyday activities, this can be achieved by (a) taking properties of the spatial environment into account to use them to one's advantage, and (b) employing a stepwise-optimal action selection strategy. We present the Opportunistic Planning Model as an explanatory cognitive model, which instantiates these assumptions, and show that the model is able to generalize to new everyday tasks, outperforming machine learning models such as neural networks during generalization.
Collapse
Affiliation(s)
- Petra Wenzl
- Institute for Artificial Intelligence, University of Bremen
| | | |
Collapse
|
8
|
Wientjes S, Holroyd CB. The successor representation subserves hierarchical abstraction for goal-directed behavior. PLoS Comput Biol 2024; 20:e1011312. [PMID: 38377074 PMCID: PMC10906840 DOI: 10.1371/journal.pcbi.1011312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/01/2024] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
Humans have the ability to craft abstract, temporally extended and hierarchically organized plans. For instance, when considering how to make spaghetti for dinner, we typically concern ourselves with useful "subgoals" in the task, such as cutting onions, boiling pasta, and cooking a sauce, rather than particulars such as how many cuts to make to the onion, or exactly which muscles to contract. A core question is how such decomposition of a more abstract task into logical subtasks happens in the first place. Previous research has shown that humans are sensitive to a form of higher-order statistical learning named "community structure". Community structure is a common feature of abstract tasks characterized by a logical ordering of subtasks. This structure can be captured by a model where humans learn predictions of upcoming events multiple steps into the future, discounting predictions of events further away in time. One such model is the "successor representation", which has been argued to be useful for hierarchical abstraction. As of yet, no study has convincingly shown that this hierarchical abstraction can be put to use for goal-directed behavior. Here, we investigate whether participants utilize learned community structure to craft hierarchically informed action plans for goal-directed behavior. Participants were asked to search for paintings in a virtual museum, where the paintings were grouped together in "wings" representing community structure in the museum. We find that participants' choices accord with the hierarchical structure of the museum and that their response times are best predicted by a successor representation. The degree to which the response times reflect the community structure of the museum correlates with several measures of performance, including the ability to craft temporally abstract action plans. These results suggest that successor representation learning subserves hierarchical abstractions relevant for goal-directed behavior.
Collapse
Affiliation(s)
- Sven Wientjes
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Clay B. Holroyd
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
9
|
McCarthy WP, Kirsh D, Fan JE. Consistency and Variation in Reasoning About Physical Assembly. Cogn Sci 2023; 47:e13397. [PMID: 38146204 DOI: 10.1111/cogs.13397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 10/27/2023] [Accepted: 12/06/2023] [Indexed: 12/27/2023]
Abstract
The ability to reason about how things were made is a pervasive aspect of how humans make sense of physical objects. Such reasoning is useful for a range of everyday tasks, from assembling a piece of furniture to making a sandwich and knitting a sweater. What enables people to reason in this way even about novel objects, and how do people draw upon prior experience with an object to continually refine their understanding of how to create it? To explore these questions, we developed a virtual task environment to investigate how people come up with step-by-step procedures for recreating block towers whose composition was not readily apparent, and analyzed how the procedures they used to build them changed across repeated attempts. Specifically, participants (N = 105) viewed 2D silhouettes of eight unique block towers in a virtual environment simulating rigid-body physics, and aimed to reconstruct each one in less than 60 s. We found that people built each tower more accurately and quickly across repeated attempts, and that this improvement reflected both group-level convergence upon a tiny fraction of all possible viable procedures, as well as error-dependent updating across successive attempts by the same individual. Taken together, our study presents a scalable approach to measuring consistency and variation in how people infer solutions to physical assembly problems.
Collapse
Affiliation(s)
| | - David Kirsh
- Department of Cognitive Science, University of California San Diego
| | - Judith E Fan
- Department of Psychology, University of California San Diego
- Department of Psychology, Stanford University
| |
Collapse
|
10
|
Stolz C, Pickering AD, Mueller EM. Dissociable feedback valence effects on frontal midline theta during reward gain versus threat avoidance learning. Psychophysiology 2022; 60:e14235. [PMID: 36529988 DOI: 10.1111/psyp.14235] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/17/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022]
Abstract
While frontal midline theta (FMθ) has been associated with threat processing, with cognitive control in the context of anxiety, and with reinforcement learning, most reinforcement learning studies on FMθ have used reward rather than threat-related stimuli as reinforcer. Accordingly, the role of FMθ in threat-related reinforcement learning is largely unknown. Here, n = 23 human participants underwent one reward-, and one punishment-, based reversal learning task, which differed only with regard to the kind of reinforcers that feedback was tied to (i.e., monetary gain vs. loud noise burst, respectively). In addition to single-trial EEG, we assessed single-trial feedback expectations based on both a reinforcement learning computational model and trial-by-trial subjective feedback expectation ratings. While participants' performance and feedback expectations were comparable between the reward and punishment tasks, FMθ was more reliably amplified to negative vs. positive feedback in the reward vs. punishment task. Regressions with feedback valence, computationally derived, and self-reported expectations as predictors and FMθ as criterion further revealed that trial-by-trial variations in FMθ specifically relate to reward-related feedback-valence and not to threat-related feedback or to violated expectations/prediction errors. These findings suggest that FMθ as measured in reinforcement learning tasks may be less sensitive to the processing of events with direct relevance for fear and anxiety.
Collapse
Affiliation(s)
- Christopher Stolz
- Department of Psychology University of Marburg Marburg Germany
- Leibniz Institute for Neurobiology (LIN) Magdeburg Germany
- Department of Psychology Goldsmiths, University of London London UK
| | | | - Erik M. Mueller
- Department of Psychology University of Marburg Marburg Germany
| |
Collapse
|
11
|
Scleidorovich P, Fellous JM, Weitzenfeld A. Adapting hippocampus multi-scale place field distributions in cluttered environments optimizes spatial navigation and learning. Front Comput Neurosci 2022; 16:1039822. [PMID: 36578316 PMCID: PMC9792172 DOI: 10.3389/fncom.2022.1039822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/21/2022] [Indexed: 12/14/2022] Open
Abstract
Extensive studies in rodents show that place cells in the hippocampus have firing patterns that are highly correlated with the animal's location in the environment and are organized in layers of increasing field sizes or scales along its dorsoventral axis. In this study, we use a spatial cognition model to show that different field sizes could be exploited to adapt the place cell representation to different environments according to their size and complexity. Specifically, we provide an in-depth analysis of how to distribute place cell fields according to the obstacles in cluttered environments to optimize learning time and path optimality during goal-oriented spatial navigation tasks. The analysis uses a reinforcement learning (RL) model that assumes that place cells allow encoding the state. While previous studies have suggested exploiting different field sizes to represent areas requiring different spatial resolutions, our work analyzes specific distributions that adapt the representation to the environment, activating larger fields in open areas and smaller fields near goals and subgoals (e.g., obstacle corners). In addition to assessing how the multi-scale representation may be exploited in spatial navigation tasks, our analysis and results suggest place cell representations that can impact the robotics field by reducing the total number of cells for path planning without compromising the quality of the paths learned.
Collapse
Affiliation(s)
- Pablo Scleidorovich
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL, United States
| | - Jean-Marc Fellous
- Department of Psychology and Biomedical Engineering, University of Arizona, Tucson, AZ, United States
| | - Alfredo Weitzenfeld
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL, United States
| |
Collapse
|
12
|
Janssen M, LeWarne C, Burk D, Averbeck BB. Hierarchical Reinforcement Learning, Sequential Behavior, and the Dorsal Frontostriatal System. J Cogn Neurosci 2022; 34:1307-1325. [PMID: 35579977 PMCID: PMC9274316 DOI: 10.1162/jocn_a_01869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
To effectively behave within ever-changing environments, biological agents must learn and act at varying hierarchical levels such that a complex task may be broken down into more tractable subtasks. Hierarchical reinforcement learning (HRL) is a computational framework that provides an understanding of this process by combining sequential actions into one temporally extended unit called an option. However, there are still open questions within the HRL framework, including how options are formed and how HRL mechanisms might be realized within the brain. In this review, we propose that the existing human motor sequence literature can aid in understanding both of these questions. We give specific emphasis to visuomotor sequence learning tasks such as the discrete sequence production task and the M × N (M steps × N sets) task to understand how hierarchical learning and behavior manifest across sequential action tasks as well as how the dorsal cortical-subcortical circuitry could support this kind of behavior. This review highlights how motor chunks within a motor sequence can function as HRL options. Furthermore, we aim to merge findings from motor sequence literature with reinforcement learning perspectives to inform experimental design in each respective subfield.
Collapse
Affiliation(s)
| | | | - Diana Burk
- National Institute of Mental Health, Bethesda, MD
| | | |
Collapse
|
13
|
Li JJ, Xia L, Dong F, Collins AGE. Credit assignment in hierarchical option transfer. COGSCI ... ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY. COGNITIVE SCIENCE SOCIETY (U.S.). CONFERENCE 2022; 44:948-954. [PMID: 36534042 PMCID: PMC9751259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Humans have the exceptional ability to efficiently structure past knowledge during learning to enable fast generalization. Xia and Collins (2021) evaluated this ability in a hierarchically structured, sequential decision-making task, where participants could build "options" (strategy "chunks") at multiple levels of temporal and state abstraction. A quantitative model, the Option Model, captured the transfer effects observed in human participants, suggesting that humans create and compose hierarchical options and use them to explore novel contexts. However, it is not well understood how learning in a new context is attributed to new and old options (i.e., the credit assignment problem). In a new context with new contingencies, where participants can recompose some aspects of previously learned options, do they reliably create new options or overwrite existing ones? Does the credit assignment depend on how similar the new option is to an old one? In our experiment, two groups of participants (n=124 and n=104) learned hierarchically structured options, experienced different amounts of negative transfer in a new option context, and were subsequently tested on the previously learned options. Behavioral analysis showed that old options were successfully reused without interference, and new options were appropriately created and credited. This credit assignment did not depend on how similar the new option was to the old option, showing great flexibility and precision in human hierarchical learning. These behavioral results were captured by the Option Model, providing further evidence for option learning and transfer in humans.
Collapse
Affiliation(s)
- Jing-Jing Li
- Helen Wills Neuroscience Institute, University of California, Berkeley
| | - Liyu Xia
- Department of Mathematics, University of California, Berkeley
| | - Flora Dong
- Department of Psychology, University of California, Berkeley
| | - Anne G E Collins
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley
| |
Collapse
|
14
|
Li Y, McClelland JL. A weighted constraint satisfaction approach to human goal-directed decision making. PLoS Comput Biol 2022; 18:e1009553. [PMID: 35709299 PMCID: PMC9255770 DOI: 10.1371/journal.pcbi.1009553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 07/05/2022] [Accepted: 05/19/2022] [Indexed: 11/29/2022] Open
Abstract
When we plan for long-range goals, proximal information cannot be exploited in a blindly myopic way, as relevant future information must also be considered. But when a subgoal must be resolved first, irrelevant future information should not interfere with the processing of more proximal, subgoal-relevant information. We explore the idea that decision making in both situations relies on the flexible modulation of the degree to which different pieces of information under consideration are weighted, rather than explicitly decomposing a problem into smaller parts and solving each part independently. We asked participants to find the shortest goal-reaching paths in mazes and modeled their initial path choices as a noisy, weighted information integration process. In a base task where choosing the optimal initial path required weighting starting-point and goal-proximal factors equally, participants did take both constraints into account, with participants who made more accurate choices tending to exhibit more balanced weighting. The base task was then embedded as an initial subtask in a larger maze, where the same two factors constrained the optimal path to a subgoal, and the final goal position was irrelevant to the initial path choice. In this more complex task, participants’ choices reflected predominant consideration of the subgoal-relevant constraints, but also some influence of the initially-irrelevant final goal. More accurate participants placed much less weight on the optimality-irrelevant goal and again tended to weight the two initially-relevant constraints more equally. These findings suggest that humans may rely on a graded, task-sensitive weighting of multiple constraints to generate approximately optimal decision outcomes in both hierarchical and non-hierarchical goal-directed tasks. Different problems require the consideration of different information sources, including often useful long-range, future information that may impact our immediate decisions. However, when future information is irrelevant to a key subgoal, it can be desirable to focus on achieving the subgoal first. We suggest that humans rely on appropriately weighting relevant information over irrelevant information to generate decision outcomes in both types of situations. We conducted behavioral experiments and fitted models of decision processes to understand to what extent people considered various task factors in choosing the initial path in different mazes, both when a simple maze occurred alone or was embedded as an initial part in a larger maze. Our results show that people approximate the optimal decision outcomes in both tasks by modulating the weighting of different factors during planning, and that people who made more accurate initial path choices modulated these weightings more successfully than those who made less accurate choices.
Collapse
Affiliation(s)
- Yuxuan Li
- Department of Psychology, Stanford University, Stanford, California, United States of America
- * E-mail: (YL); (JLM)
| | - James L. McClelland
- Department of Psychology, Stanford University, Stanford, California, United States of America
- * E-mail: (YL); (JLM)
| |
Collapse
|
15
|
Kok A. Cognitive control, motivation and fatigue: A cognitive neuroscience perspective. Brain Cogn 2022; 160:105880. [PMID: 35617813 DOI: 10.1016/j.bandc.2022.105880] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 04/07/2022] [Accepted: 05/02/2022] [Indexed: 01/22/2023]
Abstract
The present article provides a unified systematic account of the role of cognitive control, motivation and dopamine pathways in relation to the development of fatigue. Since cognitive fatigue is considered to be one aspect of the general control system that manages goal activity in the service of motivational requirements (Hockey, 2011), our focus is also broader than fatigue itself. The paper shall therefore first focus on the motivation-control interactions at the level of networks of the brain. A motivational control network is argued to play a critical role in shaping goal-directed behavior, in conjunction with dopamine systems that energize the network. Furthermore, motivation-control interactions as implemented in networks of the brain provide an important element to elucidate how decision making weighs both the anticipated benefits and costs of control operations, in optimal and suboptimal conditions such as mental fatigue. The paper further sketches how fatigue affects the connectivity of large-scale networks in the brain during effortful exercition, in particular the high-cost long striatal-cortical pathways, leading to a global reduction of integration in the brain's network architecture. The resulting neural state within these networks then enters as interoceptive information to systems in the brain that perform cost-benefit calculations. Based on these notions we propose a unifying cost-benefit model, inspired by influential insights from the current neuroscience literature of how fatigue changes the motivation to perform. The model specifies how the reward value, effort costs and fatigue aspects of task performance converge in the medial prefrontal cortex to calculate the net motivation value of stimuli and select the appropriate actions.
Collapse
Affiliation(s)
- Albert Kok
- Emeritus Professor Physiological Psychology, Brain and Cognition Group, Psychology Department, University of Amsterdam, the Netherlands.
| |
Collapse
|
16
|
Yang Q, Lin Z, Zhang W, Li J, Chen X, Zhang J, Yang T. Monkey plays Pac-Man with compositional strategies and hierarchical decision-making. eLife 2022; 11:74500. [PMID: 35286255 PMCID: PMC8963886 DOI: 10.7554/elife.74500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 03/13/2022] [Indexed: 11/18/2022] Open
Abstract
Humans can often handle daunting tasks with ease by developing a set of strategies to reduce decision-making into simpler problems. The ability to use heuristic strategies demands an advanced level of intelligence and has not been demonstrated in animals. Here, we trained macaque monkeys to play the classic video game Pac-Man. The monkeys’ decision-making may be described with a strategy-based hierarchical decision-making model with over 90% accuracy. The model reveals that the monkeys adopted the take-the-best heuristic by using one dominating strategy for their decision-making at a time and formed compound strategies by assembling the basis strategies to handle particular game situations. With the model, the computationally complex but fully quantifiable Pac-Man behavior paradigm provides a new approach to understanding animals’ advanced cognition.
Collapse
Affiliation(s)
- Qianli Yang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Zhongqiao Lin
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Wenyi Zhang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jianshu Li
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Xiyuan Chen
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of Sciences, Beijing, China
| | | | - Tianming Yang
- Institute of Neuroscience, Key Laboratory of Primate Neurobiology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China.,Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| |
Collapse
|
17
|
Yan Y, Zhuang N, Ni B, Zhang J, Xu M, Zhang Q, Zhang Z, Cheng S, Tian Q, Xu Y, Yang X, Zhang W. Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:666-683. [PMID: 31613750 DOI: 10.1109/tpami.2019.2946823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
Collapse
|
18
|
|
19
|
Trinh TT, Kimura M. Cognitive prediction of obstacle's movement for reinforcement learning pedestrian interacting model. JOURNAL OF INTELLIGENT SYSTEMS 2022. [DOI: 10.1515/jisys-2022-0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Recent studies in pedestrian simulation have been able to construct a highly realistic navigation behaviour in many circumstances. However, when replicating the close interactions between pedestrians, the replicated behaviour is often unnatural and lacks human likeness. One of the possible reasons is that the current models often ignore the cognitive factors in the human thinking process. Another reason is that many models try to approach the problem by optimising certain objectives. On the other hand, in real life, humans do not always take the most optimised decisions, particularly when interacting with other people. To improve the navigation behaviour in this circumstance, we proposed a pedestrian interacting model using reinforcement learning. Additionally, a novel cognitive prediction model, inspired by the predictive system of human cognition, is also incorporated. This helps the pedestrian agent in our model to learn to interact and predict the movement in a similar practice as humans. In our experimental results, when compared to other models, the path taken by our model’s agent is not the most optimised in certain aspects like path lengths, time taken and collisions. However, our model is able to demonstrate a more natural and human-like navigation behaviour, particularly in complex interaction settings.
Collapse
Affiliation(s)
- Thanh-Trung Trinh
- Graduate School of Engineering and Science, Shibaura Institute of Technology , Koto City , Tokyo 135-8548 , Japan
| | - Masaomi Kimura
- Department of Computer Science and Engineering, Shibaura Institute of Technology , Koto City , Tokyo 135-8548 , Japan
| |
Collapse
|
20
|
Stetter M, Lang EW. Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5590445. [PMID: 34804145 PMCID: PMC8604601 DOI: 10.1155/2021/5590445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 10/14/2021] [Accepted: 10/21/2021] [Indexed: 11/17/2022]
Abstract
Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures. Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. The agent learns to represent the dynamical physical properties of its environment by intrinsically motivated exploration and performs inference on this representation to reach goals. For this, a set of self-organizing maps which represent state-action pairs is combined with a causal model for sequence prediction. The proposed system is evaluated in the cartpole environment. After an initial phase of playful exploration, the agent can execute kinematic simulations of the environment's future and use those for action planning. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.
Collapse
Affiliation(s)
- Martin Stetter
- Department of Bioengineering Sciences, Weihenstephan-Triesdorf University of Applied Sciences, Freising D-85354, Germany
| | - Elmar W. Lang
- Computational Intelligence and Machine Learning Group, Department of Biophysics, University of Regensburg, Regensburg D-93053, Germany
| |
Collapse
|
21
|
Chalita MA, Sedzielarz A. Beyond the frame problem: what (else) can Heidegger do for AI? AI & SOCIETY 2021. [DOI: 10.1007/s00146-021-01280-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
22
|
Röder F, Özdemir O, Nguyen PDH, Wermter S, Eppe M. The Embodied Crossmodal Self Forms Language and Interaction: A Computational Cognitive Review. Front Psychol 2021; 12:716671. [PMID: 34484079 PMCID: PMC8415221 DOI: 10.3389/fpsyg.2021.716671] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 07/16/2021] [Indexed: 11/13/2022] Open
Abstract
Human language is inherently embodied and grounded in sensorimotor representations of the self and the world around it. This suggests that the body schema and ideomotor action-effect associations play an important role in language understanding, language generation, and verbal/physical interaction with others. There are computational models that focus purely on non-verbal interaction between humans and robots, and there are computational models for dialog systems that focus only on verbal interaction. However, there is a lack of research that integrates these approaches. We hypothesize that the development of computational models of the self is very appropriate for considering joint verbal and physical interaction. Therefore, they provide the substantial potential to foster the psychological and cognitive understanding of language grounding, and they have significant potential to improve human-robot interaction methods and applications. This review is a first step toward developing models of the self that integrate verbal and non-verbal communication. To this end, we first analyze the relevant findings and mechanisms for language grounding in the psychological and cognitive literature on ideomotor theory. Second, we identify the existing computational methods that implement physical decision-making and verbal interaction. As a result, we outline how the current computational methods can be used to create advanced computational interaction models that integrate language grounding with body schemas and self-representations.
Collapse
Affiliation(s)
- Frank Röder
- Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany
| | | | | | | | | |
Collapse
|
23
|
Stout D, Chaminade T, Apel J, Shafti A, Faisal AA. The measurement, evolution, and neural representation of action grammars of human behavior. Sci Rep 2021; 11:13720. [PMID: 34215758 PMCID: PMC8253764 DOI: 10.1038/s41598-021-92992-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 06/18/2021] [Indexed: 02/06/2023] Open
Abstract
Human behaviors from toolmaking to language are thought to rely on a uniquely evolved capacity for hierarchical action sequencing. Testing this idea will require objective, generalizable methods for measuring the structural complexity of real-world behavior. Here we present a data-driven approach for extracting action grammars from basic ethograms, exemplified with respect to the evolutionarily relevant behavior of stone toolmaking. We analyzed sequences from the experimental replication of ~ 2.5 Mya Oldowan vs. ~ 0.5 Mya Acheulean tools, finding that, while using the same "alphabet" of elementary actions, Acheulean sequences are quantifiably more complex and Oldowan grammars are a subset of Acheulean grammars. We illustrate the utility of our complexity measures by re-analyzing data from an fMRI study of stone toolmaking to identify brain responses to structural complexity. Beyond specific implications regarding the co-evolution of language and technology, this exercise illustrates the general applicability of our method to investigate naturalistic human behavior and cognition.
Collapse
Affiliation(s)
- Dietrich Stout
- Department of Anthropology, Emory University, Atlanta, GA, USA.
| | - Thierry Chaminade
- Institut de Neurosciences de La Timone, Aix Marseille Université, Marseille, France
| | - Jan Apel
- Department of Archaeology, Stockholm University, Stockholm, Sweden
| | - Ali Shafti
- Department of Bioengineering, Imperial College London, London, UK
| | - A Aldo Faisal
- Department of Bioengineering, Imperial College London, London, UK.
- Department of Computing, Imperial College London, London, UK.
- Integrative Biology, MRC London Institute of Medical Sciences, London, UK.
- Behaviour Analytics Lab, Data Science Institute, London, UK.
| |
Collapse
|
24
|
Eckstein MK, Collins AGE. How the Mind Creates Structure: Hierarchical Learning of Action Sequences. COGSCI ... ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY. COGNITIVE SCIENCE SOCIETY (U.S.). CONFERENCE 2021; 43:618-624. [PMID: 34964045 PMCID: PMC8711273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Humans have the astonishing capacity to quickly adapt to varying environmental demands and reach complex goals in the absence of extrinsic rewards. Part of what underlies this capacity is the ability to flexibly reuse and recombine previous experiences, and to plan future courses of action in a psychological space that is shaped by these experiences. Decades of research have suggested that humans use hierarchical representations for efficient planning and flexibility, but the origin of these representations has remained elusive. This study investigates how 73 participants learned hierarchical representations through experience, in a task in which they had to perform complex action sequences to obtain rewards. Complex action sequences were composed of simpler action sequences, which were not rewarded, but whose completion was signaled to participants. We investigated the process with which participants learned to perform simpler action sequences and combined them into complex action sequences. After learning action sequences, participants completed a transfer phase in which either simple sequences or complex sequences were manipulated without notice. Relearning progressed slower when simple than complex sequences were changed, in accordance with a hierarchical representations in which lower levels are quickly consolidated, potentially stabilizing exploration, while higher levels remain malleable, with benefits for flexible recombination.
Collapse
Affiliation(s)
- Maria K Eckstein
- Department of Psychology, 2121 Berkeley Way West, Berkeley, California 94720, USA
| | - Anne G E Collins
- Department of Psychology, 2121 Berkeley Way West, Berkeley, California 94720, USA
| |
Collapse
|
25
|
Xia L, Collins AGE. Temporal and state abstractions for efficient learning, transfer, and composition in humans. Psychol Rev 2021; 128:643-666. [PMID: 34014709 PMCID: PMC8485577 DOI: 10.1037/rev0000295] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Humans use prior knowledge to efficiently solve novel tasks, but how they structure past knowledge during learning to enable such fast generalization is not well understood. We recently proposed that hierarchical state abstraction enabled generalization of simple one-step rules, by inferring context clusters for each rule. However, humans' daily tasks are often temporally extended, and necessitate more complex multi-step, hierarchically structured strategies. The options framework in hierarchical reinforcement learning provides a theoretical framework for representing such transferable strategies. Options are abstract multi-step policies, assembled from simpler one-step actions or other options, that can represent meaningful reusable strategies as temporal abstractions. We developed a novel sequential decision-making protocol to test if humans learn and transfer multi-step options. In a series of four experiments, we found transfer effects at multiple hierarchical levels of abstraction that could not be explained by flat reinforcement learning models or hierarchical models lacking temporal abstractions. We extended the options framework to develop a quantitative model that blends temporal and state abstractions. Our model captures the transfer effects observed in human participants. Our results provide evidence that humans create and compose hierarchical options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- Liyu Xia
- Department of Mathematics, University of California, Berkeley
| | - Anne G E Collins
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley
| |
Collapse
|
26
|
Sullivan B, Ludwig CJH, Damen D, Mayol-Cuevas W, Gilchrist ID. Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent. J Vis 2021; 21:13. [PMID: 33688920 PMCID: PMC7961111 DOI: 10.1167/jov.21.3.13] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Eye movements can support ongoing manipulative actions, but a class of so-called look ahead fixations (LAFs) are related to future tasks. We examined LAFs in a complex natural task—assembling a camping tent. Tent assembly is a relatively uncommon task and requires the completion of multiple subtasks in sequence over a 5- to 20-minute duration. Participants wore a head-mounted camera and eye tracker. Subtasks and LAFs were annotated. We document four novel aspects of LAFs. First, LAFs were not random and their frequency was biased to certain objects and subtasks. Second, latencies are larger than previously noted, with 35% of LAFs occurring within 10 seconds before motor manipulation and 75% within 100 seconds. Third, LAF behavior extends far into future subtasks, because only 47% of LAFs are made to objects relevant to the current subtask. Seventy-five percent of LAFs are to objects used within five upcoming steps. Last, LAFs are often directed repeatedly to the target before manipulation, suggesting memory volatility. LAFs with short fixation–action latencies have been hypothesized to benefit future visual search and/or motor manipulation. However, the diversity of LAFs suggest they may also reflect scene exploration and task relevance, as well as longer term problem solving and task planning.
Collapse
Affiliation(s)
- Brian Sullivan
- School of Psychological Sciences, University of Bristol, Bristol, UK.,
| | | | - Dima Damen
- Department of Computer Science, University of Bristol, Bristol, UK.,
| | | | - Iain D Gilchrist
- School of Psychological Sciences, University of Bristol, Bristol, UK.,
| |
Collapse
|
27
|
Marković D, Goschke T, Kiebel SJ. Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2021; 21:509-533. [PMID: 33372237 PMCID: PMC8208938 DOI: 10.3758/s13415-020-00837-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/17/2020] [Indexed: 12/12/2022]
Abstract
Cognitive control is typically understood as a set of mechanisms that enable humans to reach goals that require integrating the consequences of actions over longer time scales. Importantly, using routine behaviour or making choices beneficial only at short time scales would prevent one from attaining these goals. During the past two decades, researchers have proposed various computational cognitive models that successfully account for behaviour related to cognitive control in a wide range of laboratory tasks. As humans operate in a dynamic and uncertain environment, making elaborate plans and integrating experience over multiple time scales is computationally expensive. Importantly, it remains poorly understood how uncertain consequences at different time scales are integrated into adaptive decisions. Here, we pursue the idea that cognitive control can be cast as active inference over a hierarchy of time scales, where inference, i.e., planning, at higher levels of the hierarchy controls inference at lower levels. We introduce the novel concept of meta-control states, which link higher-level beliefs with lower-level policy inference. Specifically, we conceptualize cognitive control as inference over these meta-control states, where solutions to cognitive control dilemmas emerge through surprisal minimisation at different hierarchy levels. We illustrate this concept using the exploration-exploitation dilemma based on a variant of a restless multi-armed bandit task. We demonstrate that beliefs about contexts and meta-control states at a higher level dynamically modulate the balance of exploration and exploitation at the lower level of a single action. Finally, we discuss the generalisation of this meta-control concept to other control dilemmas.
Collapse
Affiliation(s)
- Dimitrije Marković
- Chair of Neuroimaging, Faculty of Psychology, Technische Universität Dresden, 01062, Dresden, Germany
| | - Thomas Goschke
- Chair of General Psychology, Faculty of Psychology, Technische Universität Dresden, 01062, Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062, Dresden, Germany
| | - Stefan J Kiebel
- Chair of Neuroimaging, Faculty of Psychology, Technische Universität Dresden, 01062, Dresden, Germany.
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, 01062, Dresden, Germany.
| |
Collapse
|
28
|
Gumbsch C, Butz MV, Martius G. Autonomous Identification and Goal-Directed Invocation of Event-Predictive Behavioral Primitives. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2019.2925890] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
29
|
Banerjee A, Rikhye RV, Marblestone A. Reinforcement-guided learning in frontal neocortex: emerging computational concepts. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.02.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
30
|
De Dreu CKW, Pliskin R, Rojek-Giffin M, Méder Z, Gross J. Political games of attack and defence. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200135. [PMID: 33611990 PMCID: PMC7934902 DOI: 10.1098/rstb.2020.0135] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Political conflicts often revolve around changing versus defending a status quo. We propose to capture the dynamics between proponents and opponents of political change in terms of an asymmetric game of attack and defence with its equilibrium in mixed strategies. Formal analyses generate predictions about effort expended on revising and protecting the status quo, the form and function of false signalling and cheap talk, how power differences impact conflict intensity and the likelihood of status quo revision. Laboratory experiments on the neurocognitive and hormonal foundations of attack and defence reveal that out-of-equilibrium investments in attack emerge because of non-selfish preferences, limited capacity to compute costs and benefits and optimistic beliefs about the chances of winning from one's rival. We conclude with implications for the likelihood of political change and inertia, and discuss the role of ideology in political games of attack and defence. This article is part of the theme issue ‘The political brain: neurocognitive and computational mechanisms’.
Collapse
Affiliation(s)
- Carsten K W De Dreu
- Social, Economic and Organizational Psychology, Leiden University, Leiden, The Netherlands.,Center for Experimental Economics and Political Decision Making, University of Amsterdam, Amsterdam, The Netherlands
| | - Ruthie Pliskin
- Social, Economic and Organizational Psychology, Leiden University, Leiden, The Netherlands
| | - Michael Rojek-Giffin
- Social, Economic and Organizational Psychology, Leiden University, Leiden, The Netherlands
| | - Zsombor Méder
- Social, Economic and Organizational Psychology, Leiden University, Leiden, The Netherlands
| | - Jörg Gross
- Social, Economic and Organizational Psychology, Leiden University, Leiden, The Netherlands
| |
Collapse
|
31
|
The Best Laid Plans: Computational Principles of Anterior Cingulate Cortex. Trends Cogn Sci 2021; 25:316-329. [PMID: 33593641 DOI: 10.1016/j.tics.2021.01.008] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/17/2021] [Accepted: 01/19/2021] [Indexed: 12/26/2022]
Abstract
Despite continual debate for the past 30 years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. However, recent computational modeling work has provided insight into this question. Here we review computational models that illustrate three core principles of ACC function, related to hierarchy, world models, and cost. We also discuss four constraints on the neural implementation of these principles, related to modularity, binding, encoding, and learning and regulation. These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning (HMB-HRL), which instantiates a mechanism motivating the execution of high-level plans.
Collapse
|
32
|
Herd S, Krueger K, Nair A, Mollick J, O'Reilly R. Neural Mechanisms of Human Decision-Making. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2021; 21:35-57. [PMID: 33409958 DOI: 10.3758/s13415-020-00842-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/28/2020] [Indexed: 11/08/2022]
Abstract
We present a theory and neural network model of the neural mechanisms underlying human decision-making. We propose a detailed model of the interaction between brain regions, under a proposer-predictor-actor-critic framework. This theory is based on detailed animal data and theories of action-selection. Those theories are adapted to serial operation to bridge levels of analysis and explain human decision-making. Task-relevant areas of cortex propose a candidate plan using fast, model-free, parallel neural computations. Other areas of cortex and medial temporal lobe can then predict likely outcomes of that plan in this situation. This optional prediction- (or model-) based computation can produce better accuracy and generalization, at the expense of speed. Next, linked regions of basal ganglia act to accept or reject the proposed plan based on its reward history in similar contexts. If that plan is rejected, the process repeats to consider a new option. The reward-prediction system acts as a critic to determine the value of the outcome relative to expectations and produce dopamine as a training signal for cortex and basal ganglia. By operating sequentially and hierarchically, the same mechanisms previously proposed for animal action-selection could explain the most complex human plans and decisions. We discuss explanations of model-based decisions, habitization, and risky behavior based on the computational model.
Collapse
Affiliation(s)
- Seth Herd
- eCortex, Inc., Boulder, CO, USA.
- University of Colorado, Boulder, CO, USA.
| | - Kai Krueger
- eCortex, Inc., Boulder, CO, USA
- University of Colorado, Boulder, CO, USA
| | - Ananta Nair
- eCortex, Inc., Boulder, CO, USA
- University of Colorado, Boulder, CO, USA
| | - Jessica Mollick
- eCortex, Inc., Boulder, CO, USA
- University of Colorado, Boulder, CO, USA
- Yale University, New Haven, CT, USA
| | - Randall O'Reilly
- eCortex, Inc., Boulder, CO, USA
- University of Colorado, Boulder, CO, USA
- University of California, Davis, Davis, CA, USA
| |
Collapse
|
33
|
Goekoop R, de Kleijn R. How higher goals are constructed and collapse under stress: A hierarchical Bayesian control systems perspective. Neurosci Biobehav Rev 2021; 123:257-285. [PMID: 33497783 DOI: 10.1016/j.neubiorev.2020.12.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 11/19/2020] [Accepted: 12/19/2020] [Indexed: 01/26/2023]
Abstract
In this paper, we show that organisms can be modeled as hierarchical Bayesian control systems with small world and information bottleneck (bow-tie) network structure. Such systems combine hierarchical perception with hierarchical goal setting and hierarchical action control. We argue that hierarchical Bayesian control systems produce deep hierarchies of goal states, from which it follows that organisms must have some form of 'highest goals'. For all organisms, these involve internal (self) models, external (social) models and overarching (normative) models. We show that goal hierarchies tend to decompose in a top-down manner under severe and prolonged levels of stress. This produces behavior that favors short-term and self-referential goals over long term, social and/or normative goals. The collapse of goal hierarchies is universally accompanied by an increase in entropy (disorder) in control systems that can serve as an early warning sign for tipping points (disease or death of the organism). In humans, learning goal hierarchies corresponds to personality development (maturation). The failure of goal hierarchies to mature properly corresponds to personality deficits. A top-down collapse of such hierarchies under stress is identified as a common factor in all forms of episodic mental disorders (psychopathology). The paper concludes by discussing ways of testing these hypotheses empirically.
Collapse
Affiliation(s)
- Rutger Goekoop
- Parnassia Group, PsyQ, Department of Anxiety Disorders, Early Detection and Intervention Team (EDIT), Netherlands.
| | - Roy de Kleijn
- Cognitive Psychology Unit, Leiden University, Netherlands
| |
Collapse
|
34
|
Márton CD, Schultz SR, Averbeck BB. Learning to select actions shapes recurrent dynamics in the corticostriatal system. Neural Netw 2020; 132:375-393. [PMID: 32992244 PMCID: PMC7685243 DOI: 10.1016/j.neunet.2020.09.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 09/03/2020] [Accepted: 09/11/2020] [Indexed: 01/03/2023]
Abstract
Learning to select appropriate actions based on their values is fundamental to adaptive behavior. This form of learning is supported by fronto-striatal systems. The dorsal-lateral prefrontal cortex (dlPFC) and the dorsal striatum (dSTR), which are strongly interconnected, are key nodes in this circuitry. Substantial experimental evidence, including neurophysiological recordings, have shown that neurons in these structures represent key aspects of learning. The computational mechanisms that shape the neurophysiological responses, however, are not clear. To examine this, we developed a recurrent neural network (RNN) model of the dlPFC-dSTR circuit and trained it on an oculomotor sequence learning task. We compared the activity generated by the model to activity recorded from monkey dlPFC and dSTR in the same task. This network consisted of a striatal component which encoded action values, and a prefrontal component which selected appropriate actions. After training, this system was able to autonomously represent and update action values and select actions, thus being able to closely approximate the representational structure in corticostriatal recordings. We found that learning to select the correct actions drove action-sequence representations further apart in activity space, both in the model and in the neural data. The model revealed that learning proceeds by increasing the distance between sequence-specific representations. This makes it more likely that the model will select the appropriate action sequence as learning develops. Our model thus supports the hypothesis that learning in networks drives the neural representations of actions further apart, increasing the probability that the network generates correct actions as learning proceeds. Altogether, this study advances our understanding of how neural circuit dynamics are involved in neural computation, revealing how dynamics in the corticostriatal system support task learning.
Collapse
Affiliation(s)
- Christian D Márton
- Centre for Neurotechnology & Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK; Laboratory of Neuropsychology, Section on Learning and Decision Making, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA.
| | - Simon R Schultz
- Centre for Neurotechnology & Department of Bioengineering, Imperial College London, London, SW7 2AZ, UK
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, Section on Learning and Decision Making, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
35
|
Mollick JA, Hazy TE, Krueger KA, Nair A, Mackie P, Herd SA, O'Reilly RC. A systems-neuroscience model of phasic dopamine. Psychol Rev 2020; 127:972-1021. [PMID: 32525345 PMCID: PMC8453660 DOI: 10.1037/rev0000199] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
We describe a neurobiologically informed computational model of phasic dopamine signaling to account for a wide range of findings, including many considered inconsistent with the simple reward prediction error (RPE) formalism. The central feature of this PVLV framework is a distinction between a primary value (PV) system for anticipating primary rewards (Unconditioned Stimuli [USs]), and a learned value (LV) system for learning about stimuli associated with such rewards (CSs). The LV system represents the amygdala, which drives phasic bursting in midbrain dopamine areas, while the PV system represents the ventral striatum, which drives shunting inhibition of dopamine for expected USs (via direct inhibitory projections) and phasic pausing for expected USs (via the lateral habenula). Our model accounts for data supporting the separability of these systems, including individual differences in CS-based (sign-tracking) versus US-based learning (goal-tracking). Both systems use competing opponent-processing pathways representing evidence for and against specific USs, which can explain data dissociating the processes involved in acquisition versus extinction conditioning. Further, opponent processing proved critical in accounting for the full range of conditioned inhibition phenomena, and the closely related paradigm of second-order conditioning. Finally, we show how additional separable pathways representing aversive USs, largely mirroring those for appetitive USs, also have important differences from the positive valence case, allowing the model to account for several important phenomena in aversive conditioning. Overall, accounting for all of these phenomena strongly constrains the model, thus providing a well-validated framework for understanding phasic dopamine signaling. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
- Jessica A Mollick
- Department of Psychology and Neuroscience, University of Colorado Boulder
| | - Thomas E Hazy
- Department of Psychology and Neuroscience, University of Colorado Boulder
| | - Kai A Krueger
- Department of Psychology and Neuroscience, University of Colorado Boulder
| | - Ananta Nair
- Department of Psychology and Neuroscience, University of Colorado Boulder
| | - Prescott Mackie
- Department of Psychology and Neuroscience, University of Colorado Boulder
| | - Seth A Herd
- Department of Psychology and Neuroscience, University of Colorado Boulder
| | - Randall C O'Reilly
- Department of Psychology and Neuroscience, University of Colorado Boulder
| |
Collapse
|
36
|
Daryanavard S, Porr B. Closed-Loop Deep Learning: Generating Forward Models With Backpropagation. Neural Comput 2020; 32:2122-2144. [PMID: 32946708 DOI: 10.1162/neco_a_01317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A reflex is a simple closed-loop control approach that tries to minimize an error but fails to do so because it will always react too late. An adaptive algorithm can use this error to learn a forward model with the help of predictive cues. For example, a driver learns to improve steering by looking ahead to avoid steering in the last minute. In order to process complex cues such as the road ahead, deep learning is a natural choice. However, this is usually achieved only indirectly by employing deep reinforcement learning having a discrete state space. Here, we show how this can be directly achieved by embedding deep learning into a closed-loop system and preserving its continuous processing. We show in z-space specifically how error backpropagation can be achieved and in general how gradient-based approaches can be analyzed in such closed-loop scenarios. The performance of this learning paradigm is demonstrated using a line follower in simulation and on a real robot that shows very fast and continuous learning.
Collapse
Affiliation(s)
- Sama Daryanavard
- Biomedical Engineering Division, School of Engineering, University of Glasgow, Glasgow G12 8QQ, U.K.
| | - Bernd Porr
- Biomedical Engineering Division, School of Engineering, University of Glasgow, Glasgow G12 8QQ, U.K.
| |
Collapse
|
37
|
Averbeck BB, Murray EA. Hypothalamic Interactions with Large-Scale Neural Circuits Underlying Reinforcement Learning and Motivated Behavior. Trends Neurosci 2020; 43:681-694. [PMID: 32762959 PMCID: PMC7483858 DOI: 10.1016/j.tins.2020.06.006] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/02/2020] [Accepted: 06/19/2020] [Indexed: 02/02/2023]
Abstract
Biological agents adapt behavior to support the survival needs of the individual and the species. In this review we outline the anatomical, physiological, and computational processes that support reinforcement learning (RL). We describe two circuits in the primate brain that are linked to specific aspects of learning and goal-directed behavior. The ventral circuit, that includes the amygdala, ventral medial prefrontal cortex, and ventral striatum, has substantial connectivity with the hypothalamus. The dorsal circuit, that includes inferior parietal cortex, dorsal lateral prefrontal cortex, and the dorsal striatum, has minimal connectivity with the hypothalamus. The hypothalamic connectivity suggests distinct roles for these circuits. We propose that the ventral circuit defines behavioral goals, and the dorsal circuit orchestrates behavior to achieve those goals.
Collapse
Affiliation(s)
- Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health (NIMH), National Institutes of Health, Bethesda, MD 20892-4415, USA.
| | - Elisabeth A Murray
- Laboratory of Neuropsychology, National Institute of Mental Health (NIMH), National Institutes of Health, Bethesda, MD 20892-4415, USA
| |
Collapse
|
38
|
Jara-Ettinger J, Schulz LE, Tenenbaum JB. The Naïve Utility Calculus as a unified, quantitative framework for action understanding. Cogn Psychol 2020; 123:101334. [PMID: 32738590 DOI: 10.1016/j.cogpsych.2020.101334] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 05/22/2020] [Accepted: 07/17/2020] [Indexed: 11/24/2022]
Abstract
The human ability to reason about the causes behind other people' behavior is critical for navigating the social world. Recent empirical research with both children and adults suggests that this ability is structured around an assumption that other agents act to maximize some notion of subjective utility. In this paper, we present a formal theory of this Naïve Utility Calculus as a probabilistic generative model, which highlights the role of cost and reward tradeoffs in a Bayesian framework for action-understanding. Our model predicts with quantitative accuracy how people infer agents' subjective costs and rewards based on their observable actions. By distinguishing between desires, goals, and intentions, the model extends to complex action scenarios unfolding over space and time in scenes with multiple objects and multiple action episodes. We contrast our account with simpler model variants and a set of special-case heuristics across a wide range of action-understanding tasks: inferring costs and rewards, making confidence judgments about relative costs and rewards, combining inferences from multiple events, predicting future behavior, inferring knowledge or ignorance, and reasoning about social goals. Our work sheds light on the basic representations and computations that structure our everyday ability to make sense of and navigate the social world.
Collapse
Affiliation(s)
- Julian Jara-Ettinger
- Department of Psychology, Yale University, United States; Department of Computer Science, Yale University, United States.
| | - Laura E Schulz
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, United States
| | - Joshua B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States; Center for Brains, Minds and Machines, Massachusetts Institute of Technology, United States
| |
Collapse
|
39
|
Momennejad I. Learning Structures: Predictive Representations, Replay, and Generalization. Curr Opin Behav Sci 2020; 32:155-166. [DOI: 10.1016/j.cobeha.2020.02.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
40
|
Tschantz A, Seth AK, Buckley CL. Learning action-oriented models through active inference. PLoS Comput Biol 2020; 16:e1007805. [PMID: 32324758 PMCID: PMC7200021 DOI: 10.1371/journal.pcbi.1007805] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 05/05/2020] [Accepted: 03/19/2020] [Indexed: 11/29/2022] Open
Abstract
Converging theories suggest that organisms learn and exploit probabilistic models of their environment. However, it remains unclear how such models can be learned in practice. The open-ended complexity of natural environments means that it is generally infeasible for organisms to model their environment comprehensively. Alternatively, action-oriented models attempt to encode a parsimonious representation of adaptive agent-environment interactions. One approach to learning action-oriented models is to learn online in the presence of goal-directed behaviours. This constrains an agent to behaviourally relevant trajectories, reducing the diversity of the data a model need account for. Unfortunately, this approach can cause models to prematurely converge to sub-optimal solutions, through a process we refer to as a bad-bootstrap. Here, we exploit the normative framework of active inference to show that efficient action-oriented models can be learned by balancing goal-oriented and epistemic (information-seeking) behaviours in a principled manner. We illustrate our approach using a simple agent-based model of bacterial chemotaxis. We first demonstrate that learning via goal-directed behaviour indeed constrains models to behaviorally relevant aspects of the environment, but that this approach is prone to sub-optimal convergence. We then demonstrate that epistemic behaviours facilitate the construction of accurate and comprehensive models, but that these models are not tailored to any specific behavioural niche and are therefore less efficient in their use of data. Finally, we show that active inference agents learn models that are parsimonious, tailored to action, and which avoid bad bootstraps and sub-optimal convergence. Critically, our results indicate that models learned through active inference can support adaptive behaviour in spite of, and indeed because of, their departure from veridical representations of the environment. Our approach provides a principled method for learning adaptive models from limited interactions with an environment, highlighting a route to sample efficient learning algorithms.
Collapse
Affiliation(s)
- Alexander Tschantz
- Sackler Centre for Consciousness Science, University of Sussex, Falmer, Brighton, United Kingdom
- Department of Informatics, University of Sussex, Brighton, United Kingdom
| | - Anil K. Seth
- Sackler Centre for Consciousness Science, University of Sussex, Falmer, Brighton, United Kingdom
- Department of Informatics, University of Sussex, Brighton, United Kingdom
- Canadian Institute for Advanced Research, Azrieli Programme on Brain, Mind, and Consciousness, Toronto, Ontario, Canada
| | - Christopher L. Buckley
- Department of Informatics, University of Sussex, Brighton, United Kingdom
- Evolutionary and Adaptive Systems Research Group, University of Sussex, Falmer, United Kingdom
| |
Collapse
|
41
|
Tomov MS, Yagati S, Kumar A, Yang W, Gershman SJ. Discovery of hierarchical representations for efficient planning. PLoS Comput Biol 2020; 16:e1007594. [PMID: 32251444 PMCID: PMC7162548 DOI: 10.1371/journal.pcbi.1007594] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 04/16/2020] [Accepted: 12/10/2019] [Indexed: 12/12/2022] Open
Abstract
We propose that humans spontaneously organize environments into clusters of states that support hierarchical planning, enabling them to tackle challenging problems by breaking them down into sub-problems at various levels of abstraction. People constantly rely on such hierarchical presentations to accomplish tasks big and small-from planning one's day, to organizing a wedding, to getting a PhD-often succeeding on the very first attempt. We formalize a Bayesian model of hierarchy discovery that explains how humans discover such useful abstractions. Building on principles developed in structure learning and robotics, the model predicts that hierarchy discovery should be sensitive to the topological structure, reward distribution, and distribution of tasks in the environment. In five simulations, we show that the model accounts for previously reported effects of environment structure on planning behavior, such as detection of bottleneck states and transitions. We then test the novel predictions of the model in eight behavioral experiments, demonstrating how the distribution of tasks and rewards can influence planning behavior via the discovered hierarchy, sometimes facilitating and sometimes hindering performance. We find evidence that the hierarchy discovery process unfolds incrementally across trials. Finally, we propose how hierarchy discovery and hierarchical planning might be implemented in the brain. Together, these findings present an important advance in our understanding of how the brain might use Bayesian inference to discover and exploit the hidden hierarchical structure of the environment.
Collapse
Affiliation(s)
- Momchil S. Tomov
- Program in Neuroscience, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samyukta Yagati
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Agni Kumar
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Wanqian Yang
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
42
|
Kaplan R, Tauste Campo A, Bush D, King J, Principe A, Koster R, Ley Nacher M, Rocamora R, Friston KJ. Human hippocampal theta oscillations reflect sequential dependencies during spatial planning. Cogn Neurosci 2019; 11:122-131. [DOI: 10.1080/17588928.2019.1676711] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Raphael Kaplan
- Wellcome Centre for Human Neuroimaging, UCL Institute of Neurology, University College London, London, UK
- Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology, Trondheim, Norway
| | - Adrià Tauste Campo
- Center for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
- Epilepsy Unit, Department of Neurology, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
- Barcelonaβeta Brain Research Center, Pasqual Maragall Foundation, Barcelona, Spain
| | - Daniel Bush
- UCL Institute of Cognitive Neuroscience, University College London, London, UK
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - John King
- UCL Institute of Cognitive Neuroscience, University College London, London, UK
- Clinical, Education and Health Psychology, University College London, London, UK
| | - Alessandro Principe
- Epilepsy Unit, Department of Neurology, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Raphael Koster
- Wellcome Centre for Human Neuroimaging, UCL Institute of Neurology, University College London, London, UK
- UCL Institute of Cognitive Neuroscience, University College London, London, UK
| | - Miguel Ley Nacher
- Epilepsy Unit, Department of Neurology, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Rodrigo Rocamora
- Epilepsy Unit, Department of Neurology, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Karl J. Friston
- Wellcome Centre for Human Neuroimaging, UCL Institute of Neurology, University College London, London, UK
| |
Collapse
|
43
|
|
44
|
Pezzulo G, Donnarumma F, Maisto D, Stoianov I. Planning at decision time and in the background during spatial navigation. Curr Opin Behav Sci 2019. [DOI: 10.1016/j.cobeha.2019.04.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
45
|
Pitti A, Quoy M, Lavandier C, Boucenna S. Gated spiking neural network using Iterative Free-Energy Optimization and rank-order coding for structure learning in memory sequences (INFERNO GATE). Neural Netw 2019; 121:242-258. [PMID: 31581065 DOI: 10.1016/j.neunet.2019.09.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 11/16/2022]
Abstract
We present a framework based on iterative free-energy optimization with spiking neural networks for modeling the fronto-striatal system (PFC-BG) for the generation and recall of audio memory sequences. In line with neuroimaging studies carried out in the PFC, we propose a genuine coding strategy using the gain-modulation mechanism to represent abstract sequences based solely on the rank and location of items within them. Based on this mechanism, we show that we can construct a repertoire of neurons sensitive to the temporal structure in sequences from which we can represent any novel sequences. Free-energy optimization is then used to explore and to retrieve the missing indices of the items in the correct order for executive control and compositionality. We show that the gain-modulation mechanism permits the network to be robust to variabilities and to have long-term dependencies as it implements a gated recurrent neural network. This model, called Inferno Gate, is an extension of the neural architecture Inferno standing for Iterative Free-Energy Optimization of Recurrent Neural Networks with Gating or Gain-modulation. In experiments performed with an audio database of ten thousand MFCC vectors, Inferno Gate is capable of encoding efficiently and retrieving chunks of fifty items length. We then discuss the potential of our network to model the features of working memory in the PFC-BG loop for structural learning, goal-direction and hierarchical reinforcement learning.
Collapse
Affiliation(s)
- Alexandre Pitti
- Laboratoire ETIS UMR 8051, Université Paris-Seine, Université de Cergy-Pontoise, ENSEA, CNRS, France.
| | - Mathias Quoy
- Laboratoire ETIS UMR 8051, Université Paris-Seine, Université de Cergy-Pontoise, ENSEA, CNRS, France.
| | - Catherine Lavandier
- Laboratoire ETIS UMR 8051, Université Paris-Seine, Université de Cergy-Pontoise, ENSEA, CNRS, France.
| | - Sofiane Boucenna
- Laboratoire ETIS UMR 8051, Université Paris-Seine, Université de Cergy-Pontoise, ENSEA, CNRS, France.
| |
Collapse
|
46
|
Nguyen ND, Nguyen T, Nahavandi S. Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.062] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
47
|
Budaev S, Jørgensen C, Mangel M, Eliassen S, Giske J. Decision-Making From the Animal Perspective: Bridging Ecology and Subjective Cognition. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00164] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
48
|
|
49
|
Ramírez-Vizcaya S, Froese T. The Enactive Approach to Habits: New Concepts for the Cognitive Science of Bad Habits and Addiction. Front Psychol 2019; 10:301. [PMID: 30863334 PMCID: PMC6399396 DOI: 10.3389/fpsyg.2019.00301] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 01/30/2019] [Indexed: 11/13/2022] Open
Abstract
Habits are the topic of a venerable history of research that extends back to antiquity, yet they were originally disregarded by the cognitive sciences. They started to become the focus of interdisciplinary research in the 1990s, but since then there has been a stalemate between those who approach habits as a kind of bodily automatism or as a kind of mindful action. This implicit mind-body dualism is ready to be overcome with the rise of interest in embodied, embedded, extended, and enactive (4E) cognition. We review the enactive approach and highlight how it moves beyond the traditional stalemate by integrating both autonomy and sense-making into its theory of agency. It defines a habit as an adaptive, precarious, and self-sustaining network of neural, bodily, and interactive processes that generate dynamical sensorimotor patterns. Habits constitute a central source of normativity for the agent. We identify a potential shortcoming of this enactive account with respect to bad habits, since self-maintenance of a habit would always be intrinsically good. Nevertheless, this is only a problem if, following the mainstream perspective on habits, we treat habits as isolated modules. The enactive approach replaces this atomism with a view of habits as constituting an interdependent whole on whose overall viability the individual habits depend. Accordingly, we propose to define a bad habit as one whose expression, while positive for itself, significantly impairs a person's well-being by overruling the expression of other situationally relevant habits. We conclude by considering implications of this concept of bad habit for psychological and psychiatric research, particularly with respect to addiction research.
Collapse
Affiliation(s)
- Susana Ramírez-Vizcaya
- Philosophy of Science Graduate Program, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
- Institute for Philosophical Research (IIF), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Tom Froese
- Institute for Applied Mathematics and Systems Research (IIMAS), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
- Center for the Sciences of Complexity (C3), UNAM, Mexico City, Mexico
| |
Collapse
|
50
|
Eckstein MK, Starr A, Bunge SA. How the inference of hierarchical rules unfolds over time. Cognition 2019; 185:151-162. [PMID: 30711815 DOI: 10.1016/j.cognition.2019.01.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 01/08/2019] [Accepted: 01/09/2019] [Indexed: 01/20/2023]
Abstract
Inductive reasoning, which entails reaching conclusions that are based on but go beyond available evidence, has long been of interest in cognitive science. Nevertheless, knowledge is still lacking as to the specific cognitive processes that underlie inductive reasoning. Here, we shed light on these processes in two ways. First, we characterized the timecourse of inductive reasoning in a rule induction task, using pupil dilation as a moment-by-moment measure of cognitive load. Participants' patterns of behavior and pupillary responses indicated that they engaged in rule inference on-line, and were surprised when additional evidence violated their inferred rules. Second, we sought to gain insight into how participants represented rules on this task - specifically, whether they would structure the rules hierarchically when possible. We predicted the cognitive load imposed by hierarchical representations, as well as by non-hierarchical, flat ones. We used task-evoked pupil dilation as a metric of cognitive load to infer, based on these predictions, which participants represented rules with flat or hierarchical structures. Participants categorized as representing the rules hierarchically or flat differed in task performance and self-reports of strategy. Hierarchical rule representation was associated with more efficient performance and more pronounced pupillary responses to rule violations on trials that afford a higher-order regularity, but with less efficient performance on trials that do not. Thus, differences in rule representation can be inferred from a physiological measure of cognitive load, and are associated with differences in performance. These results illustrate how pupillometry can provide a window into reasoning as it unfolds over time.
Collapse
Affiliation(s)
- Maria K Eckstein
- Department of Psychology, University of California, Berkeley, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, USA; Graduate School of Systemic Neurosciences, Ludwig Maximilian University, Munich, Germany.
| | - Ariel Starr
- Department of Psychology, University of California, Berkeley, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, USA
| | - Silvia A Bunge
- Department of Psychology, University of California, Berkeley, USA; Helen Wills Neuroscience Institute, University of California, Berkeley, USA
| |
Collapse
|