1
|
Artificial neural networks for model identification and parameter estimation in computational cognitive models. PLoS Comput Biol 2024; 20:e1012119. [PMID: 38748770 DOI: 10.1371/journal.pcbi.1012119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 04/27/2024] [Indexed: 05/28/2024] Open
Abstract
Computational cognitive models have been used extensively to formalize cognitive processes. Model parameters offer a simple way to quantify individual differences in how humans process information. Similarly, model comparison allows researchers to identify which theories, embedded in different models, provide the best accounts of the data. Cognitive modeling uses statistical tools to quantitatively relate models to data that often rely on computing/estimating the likelihood of the data under the model. However, this likelihood is computationally intractable for a substantial number of models. These relevant models may embody reasonable theories of cognition, but are often under-explored due to the limited range of tools available to relate them to data. We contribute to filling this gap in a simple way using artificial neural networks (ANNs) to map data directly onto model identity and parameters, bypassing the likelihood estimation. We test our instantiation of an ANN as a cognitive model fitting tool on classes of cognitive models with strong inter-trial dependencies (such as reinforcement learning models), which offer unique challenges to most methods. We show that we can adequately perform both parameter estimation and model identification using our ANN approach, including for models that cannot be fit using traditional likelihood-based methods. We further discuss our work in the context of the ongoing research leveraging simulation-based approaches to parameter estimation and model identification, and how these approaches broaden the class of cognitive models researchers can quantitatively investigate.
Collapse
|
2
|
Artificial neural networks for model identification and parameter estimation in computational cognitive models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.14.557793. [PMID: 37767088 PMCID: PMC10521012 DOI: 10.1101/2023.09.14.557793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
Computational cognitive models have been used extensively to formalize cognitive processes. Model parameters offer a simple way to quantify individual differences in how humans process information. Similarly, model comparison allows researchers to identify which theories, embedded in different models, provide the best accounts of the data. Cognitive modeling uses statistical tools to quantitatively relate models to data that often rely on computing/estimating the likelihood of the data under the model. However, this likelihood is computationally intractable for a substantial number of models. These relevant models may embody reasonable theories of cognition, but are often under-explored due to the limited range of tools available to relate them to data. We contribute to filling this gap in a simple way using artificial neural networks (ANNs) to map data directly onto model identity and parameters, bypassing the likelihood estimation. We test our instantiation of an ANN as a cognitive model fitting tool on classes of cognitive models with strong inter-trial dependencies (such as reinforcement learning models), which offer unique challenges to most methods. We show that we can adequately perform both parameter estimation and model identification using our ANN approach, including for models that cannot be fit using traditional likelihood-based methods. We further discuss our work in the context of the ongoing research leveraging simulation-based approaches to parameter estimation and model identification, and how these approaches broaden the class of cognitive models researchers can quantitatively investigate.
Collapse
|
3
|
The successor representation subserves hierarchical abstraction for goal-directed behavior. PLoS Comput Biol 2024; 20:e1011312. [PMID: 38377074 PMCID: PMC10906840 DOI: 10.1371/journal.pcbi.1011312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/01/2024] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
Humans have the ability to craft abstract, temporally extended and hierarchically organized plans. For instance, when considering how to make spaghetti for dinner, we typically concern ourselves with useful "subgoals" in the task, such as cutting onions, boiling pasta, and cooking a sauce, rather than particulars such as how many cuts to make to the onion, or exactly which muscles to contract. A core question is how such decomposition of a more abstract task into logical subtasks happens in the first place. Previous research has shown that humans are sensitive to a form of higher-order statistical learning named "community structure". Community structure is a common feature of abstract tasks characterized by a logical ordering of subtasks. This structure can be captured by a model where humans learn predictions of upcoming events multiple steps into the future, discounting predictions of events further away in time. One such model is the "successor representation", which has been argued to be useful for hierarchical abstraction. As of yet, no study has convincingly shown that this hierarchical abstraction can be put to use for goal-directed behavior. Here, we investigate whether participants utilize learned community structure to craft hierarchically informed action plans for goal-directed behavior. Participants were asked to search for paintings in a virtual museum, where the paintings were grouped together in "wings" representing community structure in the museum. We find that participants' choices accord with the hierarchical structure of the museum and that their response times are best predicted by a successor representation. The degree to which the response times reflect the community structure of the museum correlates with several measures of performance, including the ability to craft temporally abstract action plans. These results suggest that successor representation learning subserves hierarchical abstractions relevant for goal-directed behavior.
Collapse
|
4
|
Exploring the steps of learning: computational modeling of initiatory-actions among individuals with attention-deficit/hyperactivity disorder. Transl Psychiatry 2024; 14:10. [PMID: 38191535 PMCID: PMC10774270 DOI: 10.1038/s41398-023-02717-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Attention-deficit/hyperactivity disorder (ADHD) is characterized by difficulty in acting in a goal-directed manner. While most environments require a sequence of actions for goal attainment, ADHD was never studied in the context of value-based sequence learning. Here, we made use of current advancements in hierarchical reinforcement-learning algorithms to track the internal value and choice policy of individuals with ADHD performing a three-stage sequence learning task. Specifically, 54 participants (28 ADHD, 26 controls) completed a value-based reinforcement-learning task that allowed us to estimate internal action values for each trial and stage using computational modeling. We found attenuated sensitivity to action values in ADHD compared to controls, both in choice and reaction-time variability estimates. Remarkably, this was found only for first-stage actions (i.e., initiatory actions), while for actions performed just before outcome delivery the two groups were strikingly indistinguishable. These results suggest a difficulty in following value estimation for initiatory actions in ADHD.
Collapse
|
5
|
Mice identify subgoal locations through an action-driven mapping process. Neuron 2023; 111:1966-1978.e8. [PMID: 37119818 PMCID: PMC10636595 DOI: 10.1016/j.neuron.2023.03.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 10/12/2022] [Accepted: 03/27/2023] [Indexed: 05/01/2023]
Abstract
Mammals form mental maps of the environments by exploring their surroundings. Here, we investigate which elements of exploration are important for this process. We studied mouse escape behavior, in which mice are known to memorize subgoal locations-obstacle edges-to execute efficient escape routes to shelter. To test the role of exploratory actions, we developed closed-loop neural-stimulation protocols for interrupting various actions while mice explored. We found that blocking running movements directed at obstacle edges prevented subgoal learning; however, blocking several control movements had no effect. Reinforcement learning simulations and analysis of spatial data show that artificial agents can match these results if they have a region-level spatial representation and explore with object-directed movements. We conclude that mice employ an action-driven process for integrating subgoals into a hierarchical cognitive map. These findings broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge.
Collapse
|
6
|
Strategy inference during learning via cognitive activity-based credit assignment models. Sci Rep 2023; 13:9408. [PMID: 37296163 PMCID: PMC10256696 DOI: 10.1038/s41598-023-33604-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 04/15/2023] [Indexed: 06/12/2023] Open
Abstract
We develop a method for selecting meaningful learning strategies based solely on the behavioral data of a single individual in a learning experiment. We use simple Activity-Credit Assignment algorithms to model the different strategies and couple them with a novel hold-out statistical selection method. Application on rat behavioral data in a continuous T-maze task reveals a particular learning strategy that consists in chunking the paths used by the animal. Neuronal data collected in the dorsomedial striatum confirm this strategy.
Collapse
|
7
|
Humans decompose tasks by trading off utility and computational cost. PLoS Comput Biol 2023; 19:e1011087. [PMID: 37262023 DOI: 10.1371/journal.pcbi.1011087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 04/10/2023] [Indexed: 06/03/2023] Open
Abstract
Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions. How are these decompositions created and used? Here, we propose and evaluate a normative framework for task decomposition based on the simple idea that people decompose tasks to reduce the overall cost of planning while maintaining task performance. Analyzing 11,117 distinct graph-structured planning tasks, we find that our framework justifies several existing heuristics for task decomposition and makes predictions that can be distinguished from two alternative normative accounts. We report a behavioral study of task decomposition (N = 806) that uses 30 randomly sampled graphs, a larger and more diverse set than that of any previous behavioral study on this topic. We find that human responses are more consistent with our framework for task decomposition than alternative normative accounts and are most consistent with a heuristic-betweenness centrality-that is justified by our approach. Taken together, our results suggest the computational cost of planning is a key principle guiding the intelligent structuring of goal-directed behavior.
Collapse
|
8
|
Exploration patterns shape cognitive map learning. Cognition 2023; 233:105360. [PMID: 36549130 PMCID: PMC9983142 DOI: 10.1016/j.cognition.2022.105360] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 12/08/2022] [Accepted: 12/11/2022] [Indexed: 12/24/2022]
Abstract
Spontaneous, volitional spatial exploration is crucial for building up a cognitive map of the environment. However, decades of research have primarily measured the fidelity of cognitive maps after discrete, controlled learning episodes. We know little about how cognitive maps are formed during naturalistic free exploration. Here, we investigated whether exploration trajectories predicted cognitive map accuracy, and how these patterns were shaped by environmental structure. In two experiments, participants freely explored a previously unfamiliar virtual environment. We related their exploration trajectories to a measure of how long they spent in areas with high global environmental connectivity (integration, as assessed by space syntax). In both experiments, we found that participants who spent more time on paths that offered opportunities for integration formed more accurate cognitive maps. Interestingly, we found no support for our pre-registered hypothesis that self-reported trait differences in navigation ability would mediate this relationship. Our findings suggest that exploration patterns predict cognitive map accuracy, even for people who self-report low ability, and highlight the importance of considering both environmental structure and individual variability in formal theory- and model-building.
Collapse
|
9
|
Humans account for cognitive costs when finding shortcuts: An information-theoretic analysis of navigation. PLoS Comput Biol 2023; 19:e1010829. [PMID: 36608145 PMCID: PMC9851521 DOI: 10.1371/journal.pcbi.1010829] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 01/19/2023] [Accepted: 12/19/2022] [Indexed: 01/09/2023] Open
Abstract
When faced with navigating back somewhere we have been before we might either retrace our steps or seek a shorter path. Both choices have costs. Here, we ask whether it is possible to characterize formally the choice of navigational plans as a bounded rational process that trades off the quality of the plan (e.g., its length) and the cognitive cost required to find and implement it. We analyze the navigation strategies of two groups of people that are firstly trained to follow a "default policy" taking a route in a virtual maze and then asked to navigate to various known goal destinations, either in the way they want ("Go To Goal") or by taking novel shortcuts ("Take Shortcut"). We address these wayfinding problems using InfoRL: an information-theoretic approach that formalizes the cognitive cost of devising a navigational plan, as the informational cost to deviate from a well-learned route (the "default policy"). In InfoRL, optimality refers to finding the best trade-off between route length and the amount of control information required to find it. We report five main findings. First, the navigational strategies automatically identified by InfoRL correspond closely to different routes (optimal or suboptimal) in the virtual reality map, which were annotated by hand in previous research. Second, people deliberate more in places where the value of investing cognitive resources (i.e., relevant goal information) is greater. Third, compared to the group of people who receive the "Go To Goal" instruction, those who receive the "Take Shortcut" instruction find shorter but less optimal solutions, reflecting the intrinsic difficulty of finding optimal shortcuts. Fourth, those who receive the "Go To Goal" instruction modulate flexibly their cognitive resources, depending on the benefits of finding the shortcut. Finally, we found a surprising amount of variability in the choice of navigational strategies and resource investment across participants. Taken together, these results illustrate the benefits of using InfoRL to address navigational planning problems from a bounded rational perspective.
Collapse
|
10
|
Hierarchical Reinforcement Learning, Sequential Behavior, and the Dorsal Frontostriatal System. J Cogn Neurosci 2022; 34:1307-1325. [PMID: 35579977 PMCID: PMC9274316 DOI: 10.1162/jocn_a_01869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
To effectively behave within ever-changing environments, biological agents must learn and act at varying hierarchical levels such that a complex task may be broken down into more tractable subtasks. Hierarchical reinforcement learning (HRL) is a computational framework that provides an understanding of this process by combining sequential actions into one temporally extended unit called an option. However, there are still open questions within the HRL framework, including how options are formed and how HRL mechanisms might be realized within the brain. In this review, we propose that the existing human motor sequence literature can aid in understanding both of these questions. We give specific emphasis to visuomotor sequence learning tasks such as the discrete sequence production task and the M × N (M steps × N sets) task to understand how hierarchical learning and behavior manifest across sequential action tasks as well as how the dorsal cortical-subcortical circuitry could support this kind of behavior. This review highlights how motor chunks within a motor sequence can function as HRL options. Furthermore, we aim to merge findings from motor sequence literature with reinforcement learning perspectives to inform experimental design in each respective subfield.
Collapse
|
11
|
Statistical Learning in Vision. Annu Rev Vis Sci 2022; 8:265-290. [PMID: 35727961 DOI: 10.1146/annurev-vision-100720-103343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Vision and learning have long been considered to be two areas of research linked only distantly. However, recent developments in vision research have changed the conceptual definition of vision from a signal-evaluating process to a goal-oriented interpreting process, and this shift binds learning, together with the resulting internal representations, intimately to vision. In this review, we consider various types of learning (perceptual, statistical, and rule/abstract) associated with vision in the past decades and argue that they represent differently specialized versions of the fundamental learning process, which must be captured in its entirety when applied to complex visual processes. We show why the generalized version of statistical learning can provide the appropriate setup for such a unified treatment of learning in vision, what computational framework best accommodates this kind of statistical learning, and what plausible neural scheme could feasibly implement this framework. Finally, we list the challenges that the field of statistical learning faces in fulfilling the promise of being the right vehicle for advancing our understanding of vision in its entirety. Expected final online publication date for the Annual Review of Vision Science, Volume 8 is September 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
|
12
|
A weighted constraint satisfaction approach to human goal-directed decision making. PLoS Comput Biol 2022; 18:e1009553. [PMID: 35709299 PMCID: PMC9255770 DOI: 10.1371/journal.pcbi.1009553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 07/05/2022] [Accepted: 05/19/2022] [Indexed: 11/29/2022] Open
Abstract
When we plan for long-range goals, proximal information cannot be exploited in a blindly myopic way, as relevant future information must also be considered. But when a subgoal must be resolved first, irrelevant future information should not interfere with the processing of more proximal, subgoal-relevant information. We explore the idea that decision making in both situations relies on the flexible modulation of the degree to which different pieces of information under consideration are weighted, rather than explicitly decomposing a problem into smaller parts and solving each part independently. We asked participants to find the shortest goal-reaching paths in mazes and modeled their initial path choices as a noisy, weighted information integration process. In a base task where choosing the optimal initial path required weighting starting-point and goal-proximal factors equally, participants did take both constraints into account, with participants who made more accurate choices tending to exhibit more balanced weighting. The base task was then embedded as an initial subtask in a larger maze, where the same two factors constrained the optimal path to a subgoal, and the final goal position was irrelevant to the initial path choice. In this more complex task, participants’ choices reflected predominant consideration of the subgoal-relevant constraints, but also some influence of the initially-irrelevant final goal. More accurate participants placed much less weight on the optimality-irrelevant goal and again tended to weight the two initially-relevant constraints more equally. These findings suggest that humans may rely on a graded, task-sensitive weighting of multiple constraints to generate approximately optimal decision outcomes in both hierarchical and non-hierarchical goal-directed tasks. Different problems require the consideration of different information sources, including often useful long-range, future information that may impact our immediate decisions. However, when future information is irrelevant to a key subgoal, it can be desirable to focus on achieving the subgoal first. We suggest that humans rely on appropriately weighting relevant information over irrelevant information to generate decision outcomes in both types of situations. We conducted behavioral experiments and fitted models of decision processes to understand to what extent people considered various task factors in choosing the initial path in different mazes, both when a simple maze occurred alone or was embedded as an initial part in a larger maze. Our results show that people approximate the optimal decision outcomes in both tasks by modulating the weighting of different factors during planning, and that people who made more accurate initial path choices modulated these weightings more successfully than those who made less accurate choices.
Collapse
|
13
|
Efficient coding of cognitive variables underlies dopamine response and choice behavior. Nat Neurosci 2022; 25:738-748. [PMID: 35668173 DOI: 10.1038/s41593-022-01085-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 04/26/2022] [Indexed: 11/26/2022]
Abstract
Reward expectations based on internal knowledge of the external environment are a core component of adaptive behavior. However, internal knowledge may be inaccurate or incomplete due to errors in sensory measurements. Some features of the environment may also be encoded inaccurately to minimize representational costs associated with their processing. In this study, we investigated how reward expectations are affected by features of internal representations by studying behavior and dopaminergic activity while mice make time-based decisions. We show that several possible representations allow a reinforcement learning agent to model animals' overall performance during the task. However, only a small subset of highly compressed representations simultaneously reproduced the co-variability in animals' choice behavior and dopaminergic activity. Strikingly, these representations predict an unusual distribution of response times that closely match animals' behavior. These results inform how constraints of representational efficiency may be expressed in encoding representations of dynamic cognitive variables used for reward-based computations.
Collapse
|
14
|
Brain-inspired meta-reinforcement learning cognitive control in conflictual inhibition decision-making task for artificial agents. Neural Netw 2022; 154:283-302. [DOI: 10.1016/j.neunet.2022.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/21/2022]
|
15
|
People construct simplified mental representations to plan. Nature 2022; 606:129-136. [PMID: 35589843 DOI: 10.1038/s41586-022-04743-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 04/07/2022] [Indexed: 11/09/2022]
Abstract
One of the most striking features of human cognition is the ability to plan. Two aspects of human planning stand out-its efficiency and flexibility. Efficiency is especially impressive because plans must often be made in complex environments, and yet people successfully plan solutions to many everyday problems despite having limited cognitive resources1-3. Standard accounts in psychology, economics and artificial intelligence have suggested that human planning succeeds because people have a complete representation of a task and then use heuristics to plan future actions in that representation4-11. However, this approach generally assumes that task representations are fixed. Here we propose that task representations can be controlled and that such control provides opportunities to quickly simplify problems and more easily reason about them. We propose a computational account of this simplification process and, in a series of preregistered behavioural experiments, show that it is subject to online cognitive control12-14 and that people optimally balance the complexity of a task representation and its utility for planning and acting. These results demonstrate how strategically perceiving and conceiving problems facilitates the effective use of limited cognitive resources.
Collapse
|
16
|
Eye movements reveal spatiotemporal dynamics of visually-informed planning in navigation. eLife 2022; 11:73097. [PMID: 35503099 PMCID: PMC9135400 DOI: 10.7554/elife.73097] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 05/01/2022] [Indexed: 11/28/2022] Open
Abstract
Goal-oriented navigation is widely understood to depend upon internal maps. Although this may be the case in many settings, humans tend to rely on vision in complex, unfamiliar environments. To study the nature of gaze during visually-guided navigation, we tasked humans to navigate to transiently visible goals in virtual mazes of varying levels of difficulty, observing that they took near-optimal trajectories in all arenas. By analyzing participants’ eye movements, we gained insights into how they performed visually-informed planning. The spatial distribution of gaze revealed that environmental complexity mediated a striking trade-off in the extent to which attention was directed towards two complimentary aspects of the world model: the reward location and task-relevant transitions. The temporal evolution of gaze revealed rapid, sequential prospection of the future path, evocative of neural replay. These findings suggest that the spatiotemporal characteristics of gaze during navigation are significantly shaped by the unique cognitive computations underlying real-world, sequential decision making.
Collapse
|
17
|
Rational use of cognitive resources in human planning. Nat Hum Behav 2022; 6:1112-1125. [PMID: 35484209 DOI: 10.1038/s41562-022-01332-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 03/03/2022] [Indexed: 12/19/2022]
Abstract
Making good decisions requires thinking ahead, but the huge number of actions and outcomes one could consider makes exhaustive planning infeasible for computationally constrained agents, such as humans. How people are nevertheless able to solve novel problems when their actions have long-reaching consequences is thus a long-standing question in cognitive science. To address this question, we propose a model of resource-constrained planning that allows us to derive optimal planning strategies. We find that previously proposed heuristics such as best-first search are near optimal under some circumstances but not others. In a mouse-tracking paradigm, we show that people adapt their planning strategies accordingly, planning in a manner that is broadly consistent with the optimal model but not with any single heuristic model. We also find systematic deviations from the optimal model that might result from additional cognitive constraints that are yet to be uncovered.
Collapse
|
18
|
Neurophysiological Evidence for Cognitive Map Formation during Sequence Learning. eNeuro 2022; 9:ENEURO.0361-21.2022. [PMID: 35105662 PMCID: PMC8896554 DOI: 10.1523/eneuro.0361-21.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 12/03/2021] [Accepted: 01/03/2022] [Indexed: 12/29/2022] Open
Abstract
Humans deftly parse statistics from sequences. Some theories posit that humans learn these statistics by forming cognitive maps, or underlying representations of the latent space which links items in the sequence. Here, an item in the sequence is a node, and the probability of transitioning between two items is an edge. Sequences can then be generated from walks through the latent space, with different spaces giving rise to different sequence statistics. Individual or group differences in sequence learning can be modeled by changing the time scale over which estimates of transition probabilities are built, or in other words, by changing the amount of temporal discounting. Latent space models with temporal discounting bear a resemblance to models of navigation through Euclidean spaces. However, few explicit links have been made between predictions from Euclidean spatial navigation and neural activity during human sequence learning. Here, we use a combination of behavioral modeling and intracranial encephalography (iEEG) recordings to investigate how neural activity might support the formation of space-like cognitive maps through temporal discounting during sequence learning. Specifically, we acquire human reaction times from a sequential reaction time task, to which we fit a model that formulates the amount of temporal discounting as a single free parameter. From the parameter, we calculate each individual's estimate of the latent space. We find that neural activity reflects these estimates mostly in the temporal lobe, including areas involved in spatial navigation. Similar to spatial navigation, we find that low-dimensional representations of neural activity allow for easy separation of important features, such as modules, in the latent space. Lastly, we take advantage of the high temporal resolution of iEEG data to determine the time scale on which latent spaces are learned. We find that learning typically happens within the first 500 trials, and is modulated by the underlying latent space and the amount of temporal discounting characteristic of each participant. Ultimately, this work provides important links between behavioral models of sequence learning and neural activity during the same behavior, and contextualizes these results within a broader framework of domain general cognitive maps.
Collapse
|
19
|
Abstract
Recent breakthroughs in artificial intelligence (AI) have enabled machines to plan in tasks previously thought to be uniquely human. Meanwhile, the planning algorithms implemented by the brain itself remain largely unknown. Here, we review neural and behavioral data in sequential decision-making tasks that elucidate the ways in which the brain does-and does not-plan. To systematically review available biological data, we create a taxonomy of planning algorithms by summarizing the relevant design choices for such algorithms in AI. Across species, recording techniques, and task paradigms, we find converging evidence that the brain represents future states consistent with a class of planning algorithms within our taxonomy-focused, depth-limited, and serial. However, we argue that current data are insufficient for addressing more detailed algorithmic questions. We propose a new approach leveraging AI advances to drive experiments that can adjudicate between competing candidate algorithms.
Collapse
|
20
|
Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
|
21
|
|
22
|
Reinforcement learning and its connections with neuroscience and psychology. Neural Netw 2021; 145:271-287. [PMID: 34781215 DOI: 10.1016/j.neunet.2021.10.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 09/26/2021] [Accepted: 10/01/2021] [Indexed: 11/19/2022]
Abstract
Reinforcement learning methods have recently been very successful at performing complex sequential tasks like playing Atari games, Go and Poker. These algorithms have outperformed humans in several tasks by learning from scratch, using only scalar rewards obtained through interaction with their environment. While there certainly has been considerable independent innovation to produce such results, many core ideas in reinforcement learning are inspired by phenomena in animal learning, psychology and neuroscience. In this paper, we comprehensively review a large number of findings in both neuroscience and psychology that evidence reinforcement learning as a promising candidate for modeling learning and decision making in the brain. In doing so, we construct a mapping between various classes of modern RL algorithms and specific findings in both neurophysiological and behavioral literature. We then discuss the implications of this observed relationship between RL, neuroscience and psychology and its role in advancing research in both AI and brain science.
Collapse
|
23
|
How the Mind Creates Structure: Hierarchical Learning of Action Sequences. COGSCI ... ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY. COGNITIVE SCIENCE SOCIETY (U.S.). CONFERENCE 2021; 43:618-624. [PMID: 34964045 PMCID: PMC8711273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Humans have the astonishing capacity to quickly adapt to varying environmental demands and reach complex goals in the absence of extrinsic rewards. Part of what underlies this capacity is the ability to flexibly reuse and recombine previous experiences, and to plan future courses of action in a psychological space that is shaped by these experiences. Decades of research have suggested that humans use hierarchical representations for efficient planning and flexibility, but the origin of these representations has remained elusive. This study investigates how 73 participants learned hierarchical representations through experience, in a task in which they had to perform complex action sequences to obtain rewards. Complex action sequences were composed of simpler action sequences, which were not rewarded, but whose completion was signaled to participants. We investigated the process with which participants learned to perform simpler action sequences and combined them into complex action sequences. After learning action sequences, participants completed a transfer phase in which either simple sequences or complex sequences were manipulated without notice. Relearning progressed slower when simple than complex sequences were changed, in accordance with a hierarchical representations in which lower levels are quickly consolidated, potentially stabilizing exploration, while higher levels remain malleable, with benefits for flexible recombination.
Collapse
|
24
|
Abstract
Humans use prior knowledge to efficiently solve novel tasks, but how they structure past knowledge during learning to enable such fast generalization is not well understood. We recently proposed that hierarchical state abstraction enabled generalization of simple one-step rules, by inferring context clusters for each rule. However, humans' daily tasks are often temporally extended, and necessitate more complex multi-step, hierarchically structured strategies. The options framework in hierarchical reinforcement learning provides a theoretical framework for representing such transferable strategies. Options are abstract multi-step policies, assembled from simpler one-step actions or other options, that can represent meaningful reusable strategies as temporal abstractions. We developed a novel sequential decision-making protocol to test if humans learn and transfer multi-step options. In a series of four experiments, we found transfer effects at multiple hierarchical levels of abstraction that could not be explained by flat reinforcement learning models or hierarchical models lacking temporal abstractions. We extended the options framework to develop a quantitative model that blends temporal and state abstractions. Our model captures the transfer effects observed in human participants. Our results provide evidence that humans create and compose hierarchical options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
25
|
|
26
|
How higher goals are constructed and collapse under stress: A hierarchical Bayesian control systems perspective. Neurosci Biobehav Rev 2021; 123:257-285. [PMID: 33497783 DOI: 10.1016/j.neubiorev.2020.12.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 11/19/2020] [Accepted: 12/19/2020] [Indexed: 01/26/2023]
Abstract
In this paper, we show that organisms can be modeled as hierarchical Bayesian control systems with small world and information bottleneck (bow-tie) network structure. Such systems combine hierarchical perception with hierarchical goal setting and hierarchical action control. We argue that hierarchical Bayesian control systems produce deep hierarchies of goal states, from which it follows that organisms must have some form of 'highest goals'. For all organisms, these involve internal (self) models, external (social) models and overarching (normative) models. We show that goal hierarchies tend to decompose in a top-down manner under severe and prolonged levels of stress. This produces behavior that favors short-term and self-referential goals over long term, social and/or normative goals. The collapse of goal hierarchies is universally accompanied by an increase in entropy (disorder) in control systems that can serve as an early warning sign for tipping points (disease or death of the organism). In humans, learning goal hierarchies corresponds to personality development (maturation). The failure of goal hierarchies to mature properly corresponds to personality deficits. A top-down collapse of such hierarchies under stress is identified as a common factor in all forms of episodic mental disorders (psychopathology). The paper concludes by discussing ways of testing these hypotheses empirically.
Collapse
|
27
|
Structuring Knowledge with Cognitive Maps and Cognitive Graphs. Trends Cogn Sci 2021; 25:37-54. [PMID: 33248898 PMCID: PMC7746605 DOI: 10.1016/j.tics.2020.10.004] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 10/16/2020] [Accepted: 10/17/2020] [Indexed: 12/21/2022]
Abstract
Humans and animals use mental representations of the spatial structure of the world to navigate. The classical view is that these representations take the form of Euclidean cognitive maps, but alternative theories suggest that they are cognitive graphs consisting of locations connected by paths. We review evidence suggesting that both map-like and graph-like representations exist in the mind/brain that rely on partially overlapping neural systems. Maps and graphs can operate simultaneously or separately, and they may be applied to both spatial and nonspatial knowledge. By providing structural frameworks for complex information, cognitive maps and cognitive graphs may provide fundamental organizing schemata that allow us to navigate in physical, social, and conceptual spaces.
Collapse
|
28
|
Computational evidence for hierarchically structured reinforcement learning in humans. Proc Natl Acad Sci U S A 2020; 117:29381-29389. [PMID: 33229518 PMCID: PMC7703642 DOI: 10.1073/pnas.1912330117] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Humans have the fascinating ability to achieve goals in a complex and constantly changing world, still surpassing modern machine-learning algorithms in terms of flexibility and learning speed. It is generally accepted that a crucial factor for this ability is the use of abstract, hierarchical representations, which employ structure in the environment to guide learning and decision making. Nevertheless, how we create and use these hierarchical representations is poorly understood. This study presents evidence that human behavior can be characterized as hierarchical reinforcement learning (RL). We designed an experiment to test specific predictions of hierarchical RL using a series of subtasks in the realm of context-based learning and observed several behavioral markers of hierarchical RL, such as asymmetric switch costs between changes in higher-level versus lower-level features, faster learning in higher-valued compared to lower-valued contexts, and preference for higher-valued compared to lower-valued contexts. We replicated these results across three independent samples. We simulated three models-a classic RL, a hierarchical RL, and a hierarchical Bayesian model-and compared their behavior to human results. While the flat RL model captured some aspects of participants' sensitivity to outcome values, and the hierarchical Bayesian model captured some markers of transfer, only hierarchical RL accounted for all patterns observed in human behavior. This work shows that hierarchical RL, a biologically inspired and computationally simple algorithm, can capture human behavior in complex, hierarchical environments and opens the avenue for future research in this field.
Collapse
|
29
|
Discovery of hierarchical representations for efficient planning. PLoS Comput Biol 2020; 16:e1007594. [PMID: 32251444 PMCID: PMC7162548 DOI: 10.1371/journal.pcbi.1007594] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 04/16/2020] [Accepted: 12/10/2019] [Indexed: 12/12/2022] Open
Abstract
We propose that humans spontaneously organize environments into clusters of states that support hierarchical planning, enabling them to tackle challenging problems by breaking them down into sub-problems at various levels of abstraction. People constantly rely on such hierarchical presentations to accomplish tasks big and small-from planning one's day, to organizing a wedding, to getting a PhD-often succeeding on the very first attempt. We formalize a Bayesian model of hierarchy discovery that explains how humans discover such useful abstractions. Building on principles developed in structure learning and robotics, the model predicts that hierarchy discovery should be sensitive to the topological structure, reward distribution, and distribution of tasks in the environment. In five simulations, we show that the model accounts for previously reported effects of environment structure on planning behavior, such as detection of bottleneck states and transitions. We then test the novel predictions of the model in eight behavioral experiments, demonstrating how the distribution of tasks and rewards can influence planning behavior via the discovered hierarchy, sometimes facilitating and sometimes hindering performance. We find evidence that the hierarchy discovery process unfolds incrementally across trials. Finally, we propose how hierarchy discovery and hierarchical planning might be implemented in the brain. Together, these findings present an important advance in our understanding of how the brain might use Bayesian inference to discover and exploit the hidden hierarchical structure of the environment.
Collapse
|
30
|
Abstract
Psychological theories posit that affective experiences can be decomposed into component constituents, yet disagree on the level of representation of these components. Affective experiences have been previously described as emerging from core dimensions of valence and arousal. However, this view needs to be reconciled with accounts of valence processing in appetitive and aversive circuits from the neuroscience literature. Here we offer an account of affect that allows for both perspectives but compares across levels of analysis. At one level of analysis, valence and arousal are observed already in the properties of encountered stimuli and the appetitive and aversive neural circuits that engage accordingly. At another level of analysis, the explicit experiential aspect of affective processes are compressed and appraised in a manner that allows these experiences to be organized along valence and arousal axes. We review both the behavioral neuroscience evidence on appetitive and aversive circuits as well as the cognitive neuroscience literature on compression in information coding across multiple domains of processing. We argue that these processes are domain-general and adapt these principles to provide a perspective on how valence can be represented at multiple scales in the brain.
Collapse
|
31
|
Bayesian Behavioral Systems Theory. Behav Processes 2019; 168:103904. [DOI: 10.1016/j.beproc.2019.103904] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 06/30/2019] [Accepted: 07/08/2019] [Indexed: 12/29/2022]
|
32
|
|
33
|
|
34
|
|
35
|
Being right matters: Model-compliant events in predictive processing. PLoS One 2019; 14:e0218311. [PMID: 31194829 PMCID: PMC6565358 DOI: 10.1371/journal.pone.0218311] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 05/31/2019] [Indexed: 11/23/2022] Open
Abstract
While prediction errors (PE) have been established to drive learning through adaptation of internal models, the role of model-compliant events in predictive processing is less clear. Checkpoints (CP) were recently introduced as points in time where expected sensory input resolved ambiguity regarding the validity of the internal model. Conceivably, these events serve as on-line reference points for model evaluation, particularly in uncertain contexts. Evidence from fMRI has shown functional similarities of CP and PE to be independent of event-related surprise, raising the important question of how these event classes relate to one another. Consequently, the aim of the present study was to characterise the functional relationship of checkpoints and prediction errors in a serial pattern detection task using electroencephalography (EEG). Specifically, we first hypothesised a joint P3b component of both event classes to index recourse to the internal model (compared to non-informative standards, STD). Second, we assumed the mismatch signal of PE to be reflected in an N400 component when compared to CP. Event-related findings supported these hypotheses. We suggest that while model adaptation is instigated by prediction errors, checkpoints are similarly used for model evaluation. Intriguingly, behavioural subgroup analyses showed that the exploitation of potentially informative reference points may depend on initial cue learning: Strict reliance on cue-based predictions may result in less attentive processing of these reference points, thus impeding upregulation of response gain that would prompt flexible model adaptation. Overall, present results highlight the role of checkpoints as model-compliant, informative reference points and stimulate important research questions about their processing as function of learning und uncertainty.
Collapse
|
36
|
Abstract
A generally intelligent agent faces a dilemma: it requires a complex sensorimotor space to be capable of solving a wide range of problems, but many tasks are only feasible given the right problem-specific formulation. I argue that a necessary but understudied requirement for general intelligence is the ability to form task-specific abstract representations. I show that the reinforcement learning paradigm structures this question into how to learn action abstractions and how to learn state abstractions, and discuss the field's progress on these topics.
Collapse
|
37
|
Modeling sensory-motor decisions in natural behavior. PLoS Comput Biol 2018; 14:e1006518. [PMID: 30359364 PMCID: PMC6219815 DOI: 10.1371/journal.pcbi.1006518] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 11/06/2018] [Accepted: 09/18/2018] [Indexed: 11/18/2022] Open
Abstract
Although a standard reinforcement learning model can capture many aspects of reward-seeking behaviors, it may not be practical for modeling human natural behaviors because of the richness of dynamic environments and limitations in cognitive resources. We propose a modular reinforcement learning model that addresses these factors. Based on this model, a modular inverse reinforcement learning algorithm is developed to estimate both the rewards and discount factors from human behavioral data, which allows predictions of human navigation behaviors in virtual reality with high accuracy across different subjects and with different tasks. Complex human navigation trajectories in novel environments can be reproduced by an artificial agent that is based on the modular model. This model provides a strategy for estimating the subjective value of actions and how they influence sensory-motor decisions in natural behavior.
Collapse
|
38
|
Model-based spatial navigation in the hippocampus-ventral striatum circuit: A computational analysis. PLoS Comput Biol 2018; 14:e1006316. [PMID: 30222746 PMCID: PMC6160242 DOI: 10.1371/journal.pcbi.1006316] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 09/27/2018] [Accepted: 06/20/2018] [Indexed: 12/26/2022] Open
Abstract
While the neurobiology of simple and habitual choices is relatively well known, our current understanding of goal-directed choices and planning in the brain is still limited. Theoretical work suggests that goal-directed computations can be productively associated to model-based (reinforcement learning) computations, yet a detailed mapping between computational processes and neuronal circuits remains to be fully established. Here we report a computational analysis that aligns Bayesian nonparametrics and model-based reinforcement learning (MB-RL) to the functioning of the hippocampus (HC) and the ventral striatum (vStr)-a neuronal circuit that increasingly recognized to be an appropriate model system to understand goal-directed (spatial) decisions and planning mechanisms in the brain. We test the MB-RL agent in a contextual conditioning task that depends on intact hippocampus and ventral striatal (shell) function and show that it solves the task while showing key behavioral and neuronal signatures of the HC-vStr circuit. Our simulations also explore the benefits of biological forms of look-ahead prediction (forward sweeps) during both learning and control. This article thus contributes to fill the gap between our current understanding of computational algorithms and biological realizations of (model-based) reinforcement learning.
Collapse
|
39
|
Planning and navigation as active inference. BIOLOGICAL CYBERNETICS 2018; 112:323-343. [PMID: 29572721 PMCID: PMC6060791 DOI: 10.1007/s00422-018-0753-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 03/07/2018] [Indexed: 05/05/2023]
Abstract
This paper introduces an active inference formulation of planning and navigation. It illustrates how the exploitation-exploration dilemma is dissolved by acting to minimise uncertainty (i.e. expected surprise or free energy). We use simulations of a maze problem to illustrate how agents can solve quite complicated problems using context sensitive prior preferences to form subgoals. Our focus is on how epistemic behaviour-driven by novelty and the imperative to reduce uncertainty about the world-contextualises pragmatic or goal-directed behaviour. Using simulations, we illustrate the underlying process theory with synthetic behavioural and electrophysiological responses during exploration of a maze and subsequent navigation to a target location. An interesting phenomenon that emerged from the simulations was a putative distinction between 'place cells'-that fire when a subgoal is reached-and 'path cells'-that fire until a subgoal is reached.
Collapse
|
40
|
SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. Int J Rob Res 2018. [DOI: 10.1177/0278364918784350] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.
Collapse
|
41
|
Compositional clustering in task structure learning. PLoS Comput Biol 2018; 14:e1006116. [PMID: 29672581 PMCID: PMC5929577 DOI: 10.1371/journal.pcbi.1006116] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 05/01/2018] [Accepted: 04/03/2018] [Indexed: 11/18/2022] Open
Abstract
Humans are remarkably adept at generalizing knowledge between experiences in a way that can be difficult for computers. Often, this entails generalizing constituent pieces of experiences that do not fully overlap, but nonetheless share useful similarities with, previously acquired knowledge. However, it is often unclear how knowledge gained in one context should generalize to another. Previous computational models and data suggest that rather than learning about each individual context, humans build latent abstract structures and learn to link these structures to arbitrary contexts, facilitating generalization. In these models, task structures that are more popular across contexts are more likely to be revisited in new contexts. However, these models can only re-use policies as a whole and are unable to transfer knowledge about the transition structure of the environment even if only the goal has changed (or vice-versa). This contrasts with ecological settings, where some aspects of task structure, such as the transition function, will be shared between context separately from other aspects, such as the reward function. Here, we develop a novel non-parametric Bayesian agent that forms independent latent clusters for transition and reward functions, affording separable transfer of their constituent parts across contexts. We show that the relative performance of this agent compared to an agent that jointly clusters reward and transition functions depends environmental task statistics: the mutual information between transition and reward functions and the stochasticity of the observations. We formalize our analysis through an information theoretic account of the priors, and propose a meta learning agent that dynamically arbitrates between strategies across task domains to optimize a statistical tradeoff.
Collapse
|
42
|
Abstract
The need for high-throughput, precise, and meaningful methods for measuring behavior has been amplified by our recent successes in measuring and manipulating neural circuitry. The largest challenges associated with moving in this direction, however, are not technical but are instead conceptual: what numbers should one put on the movements an animal is performing (or not performing)? In this review, I will describe how theoretical and data analytical ideas are interfacing with recently-developed computational and experimental methodologies to answer these questions across a variety of contexts, length scales, and time scales. I will attempt to highlight commonalities between approaches and areas where further advances are necessary to place behavior on the same quantitative footing as other scientific fields.
Collapse
|
43
|
The hippocampus as a predictive map. Nat Neurosci 2017; 20:1643-1653. [PMID: 28967910 DOI: 10.1038/nn.4650] [Citation(s) in RCA: 347] [Impact Index Per Article: 49.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 08/29/2017] [Indexed: 12/19/2022]
Abstract
A cognitive map has long been the dominant metaphor for hippocampal function, embracing the idea that place cells encode a geometric representation of space. However, evidence for predictive coding, reward sensitivity and policy dependence in place cells suggests that the representation is not purely spatial. We approach this puzzle from a reinforcement learning perspective: what kind of spatial representation is most useful for maximizing future reward? We show that the answer takes the form of a predictive representation. This representation captures many aspects of place cell responses that fall outside the traditional view of a cognitive map. Furthermore, we argue that entorhinal grid cells encode a low-dimensionality basis set for the predictive representation, useful for suppressing noise in predictions and extracting multiscale structure for hierarchical planning.
Collapse
|
44
|
Neuroscience-Inspired Artificial Intelligence. Neuron 2017; 95:245-258. [PMID: 28728020 DOI: 10.1016/j.neuron.2017.06.011] [Citation(s) in RCA: 443] [Impact Index Per Article: 63.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Revised: 06/03/2017] [Accepted: 06/06/2017] [Indexed: 01/29/2023]
Abstract
The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields.
Collapse
|
45
|
Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network. Neuron 2017; 90:893-903. [PMID: 27196978 PMCID: PMC4882377 DOI: 10.1016/j.neuron.2016.03.037] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Revised: 02/15/2016] [Accepted: 03/31/2016] [Indexed: 11/17/2022]
Abstract
Planning allows actions to be structured in pursuit of a future goal. However, in natural environments, planning over multiple possible future states incurs prohibitive computational costs. To represent plans efficiently, states can be clustered hierarchically into “contexts”. For example, representing a journey through a subway network as a succession of individual states (stations) is more costly than encoding a sequence of contexts (lines) and context switches (line changes). Here, using functional brain imaging, we asked humans to perform a planning task in a virtual subway network. Behavioral analyses revealed that humans executed a hierarchically organized plan. Brain activity in the dorsomedial prefrontal cortex and premotor cortex scaled with the cost of hierarchical plan representation and unique neural signals in these regions signaled contexts and context switches. These results suggest that humans represent hierarchical plans using a network of caudal prefrontal structures. Video Abstract
Humans represent plans in a hierarchical fashion, over contexts as well as states Hierarchical plan complexity is encoded in caudal prefrontal cortex Ventromedial prefrontal cortex and hippocampus encode proximity to a goal state The current context can be decoded from the dorsomedial prefrontal cortex
Collapse
|
46
|
Abstract
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
Collapse
|
47
|
Event-related potentials and neural oscillations dissociate levels of cognitive control. Behav Brain Res 2017; 320:154-164. [DOI: 10.1016/j.bbr.2016.12.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Revised: 11/28/2016] [Accepted: 12/10/2016] [Indexed: 12/01/2022]
|
48
|
Abstract
To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning-a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Collapse
|
49
|
Constructing Abstraction Hierarchies Using a Skill-Symbol Loop. IJCAI : PROCEEDINGS OF THE CONFERENCE 2016; 2016:1648-1654. [PMID: 28579718 PMCID: PMC5455777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We describe a framework for building abstraction hierarchies whereby an agent alternates skill- and representation-construction phases to construct a sequence of increasingly abstract Markov decision processes. Our formulation builds on recent results showing that the appropriate abstract representation of a problem is specified by the agent's skills. We describe how such a hierarchy can be used for fast planning, and illustrate the construction of an appropriate hierarchy for the Taxi domain.
Collapse
|
50
|
Problem Solving as Probabilistic Inference with Subgoaling: Explaining Human Successes and Pitfalls in the Tower of Hanoi. PLoS Comput Biol 2016; 12:e1004864. [PMID: 27074140 PMCID: PMC4830581 DOI: 10.1371/journal.pcbi.1004864] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 03/13/2016] [Indexed: 11/18/2022] Open
Abstract
How do humans and other animals face novel problems for which predefined solutions are not available? Human problem solving links to flexible reasoning and inference rather than to slow trial-and-error learning. It has received considerable attention since the early days of cognitive science, giving rise to well known cognitive architectures such as SOAR and ACT-R, but its computational and brain mechanisms remain incompletely known. Furthermore, it is still unclear whether problem solving is a “specialized” domain or module of cognition, in the sense that it requires computations that are fundamentally different from those supporting perception and action systems. Here we advance a novel view of human problem solving as probabilistic inference with subgoaling. In this perspective, key insights from cognitive architectures are retained such as the importance of using subgoals to split problems into subproblems. However, here the underlying computations use probabilistic inference methods analogous to those that are increasingly popular in the study of perception and action systems. To test our model we focus on the widely used Tower of Hanoi (ToH) task, and show that our proposed method can reproduce characteristic idiosyncrasies of human problem solvers: their sensitivity to the “community structure” of the ToH and their difficulties in executing so-called “counterintuitive” movements. Our analysis reveals that subgoals have two key roles in probabilistic inference and problem solving. First, prior beliefs on (likely) useful subgoals carve the problem space and define an implicit metric for the problem at hand—a metric to which humans are sensitive. Second, subgoals are used as waypoints in the probabilistic problem solving inference and permit to find effective solutions that, when unavailable, lead to problem solving deficits. Our study thus suggests that a probabilistic inference scheme enhanced with subgoals provides a comprehensive framework to study problem solving and its deficits. How humans solve challenging problems such as the Tower of Hanoi (ToH) or related puzzles is still largely unknown. Here we advance a computational model that uses the same probabilistic inference methods as those that are increasingly popular in the study of perception and action systems, thus making the point that problem solving does not need to be a specialized module or domain of cognition, but it can use the same computations underlying sensorimotor behavior. Crucially, we augment the probabilistic inference methods with subgoaling mechanisms that essentially permit to split the problem space into more manageable subparts, which are easier to solve. We show that our computational model can correctly reproduce important characteristics (and pitfalls) of human problem solving, including the sensitivity to the “community structure” of the ToH and the difficulty of executing so-called “counterintuitive” movements that require to (temporarily) move away from the final goal to successively achieve it.
Collapse
|