1
|
Consistency and Variation in Reasoning About Physical Assembly. Cogn Sci 2023; 47:e13397. [PMID: 38146204 DOI: 10.1111/cogs.13397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 10/27/2023] [Accepted: 12/06/2023] [Indexed: 12/27/2023]
Abstract
The ability to reason about how things were made is a pervasive aspect of how humans make sense of physical objects. Such reasoning is useful for a range of everyday tasks, from assembling a piece of furniture to making a sandwich and knitting a sweater. What enables people to reason in this way even about novel objects, and how do people draw upon prior experience with an object to continually refine their understanding of how to create it? To explore these questions, we developed a virtual task environment to investigate how people come up with step-by-step procedures for recreating block towers whose composition was not readily apparent, and analyzed how the procedures they used to build them changed across repeated attempts. Specifically, participants (N = 105) viewed 2D silhouettes of eight unique block towers in a virtual environment simulating rigid-body physics, and aimed to reconstruct each one in less than 60 s. We found that people built each tower more accurately and quickly across repeated attempts, and that this improvement reflected both group-level convergence upon a tiny fraction of all possible viable procedures, as well as error-dependent updating across successive attempts by the same individual. Taken together, our study presents a scalable approach to measuring consistency and variation in how people infer solutions to physical assembly problems.
Collapse
|
2
|
Competing roles of intention and habit in predicting behavior: A comprehensive literature review, synthesis, and longitudinal field study. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2023. [DOI: 10.1016/j.ijinfomgt.2023.102644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
|
3
|
Evidence for entropy maximisation in human free choice behaviour. Cognition 2023; 232:105328. [PMID: 36463639 DOI: 10.1016/j.cognition.2022.105328] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/10/2022] [Accepted: 11/12/2022] [Indexed: 12/05/2022]
Abstract
The freedom to choose between options is strongly linked to notions of free will. Accordingly, several studies have shown that individuals demonstrate a preference for choice, or the availability of multiple options, over and above utilitarian value. Yet we lack a decision-making framework that integrates preference for choice with traditional utility maximisation in free choice behaviour. Here we test the predictions of an inference-based model of decision-making in which an agent actively seeks states yielding entropy (availability of options) in addition to utility (economic reward). We designed a study in which participants freely navigated a virtual environment consisting of two consecutive choices leading to reward locations in separate rooms. Critically, the choice of one room always led to two final doors while, in the second room, only one door was permissible to choose. This design allowed us to separately determine the influence of utility and entropy on participants' choice behaviour and their self-evaluation of free will. We found that choice behaviour was better predicted by an inference-based model than by expected utility alone, and that both the availability of options and the value of the context positively influenced participants' perceived freedom of choice. Moreover, this consideration of options was apparent in the ongoing motion dynamics as individuals navigated the environment. In a second study, in which participants selected between rooms that gave access to three or four doors, we observed a similar pattern of results, with participants preferring the room that gave access to more options and feeling freer in it. These results suggest that free choice behaviour is well explained by an inference-based framework in which both utility and entropy are optimised and supports the idea that the feeling of having free will is tightly related to options availability.
Collapse
|
4
|
Design Principles for Neurorobotics. Front Neurorobot 2022; 16:882518. [PMID: 35692490 PMCID: PMC9174684 DOI: 10.3389/fnbot.2022.882518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 04/19/2022] [Indexed: 11/13/2022] Open
Abstract
In their book “How the Body Shapes the Way We Think: A New View of Intelligence,” Pfeifer and Bongard put forth an embodied approach to cognition. Because of this position, many of their robot examples demonstrated “intelligent” behavior despite limited neural processing. It is our belief that neurorobots should attempt to follow many of these principles. In this article, we discuss a number of principles to consider when designing neurorobots and experiments using robots to test brain theories. These principles are strongly inspired by Pfeifer and Bongard, but build on their design principles by grounding them in neuroscience and by adding principles based on neuroscience research. Our design principles fall into three categories. First, organisms must react quickly and appropriately to events. Second, organisms must have the ability to learn and remember over their lifetimes. Third, organisms must weigh options that are crucial for survival. We believe that by following these design principles a robot's behavior will be more naturalistic and more successful.
Collapse
|
5
|
Reinforcement learning and bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal. Dev Cogn Neurosci 2022; 55:101106. [PMID: 35537273 PMCID: PMC9108470 DOI: 10.1016/j.dcn.2022.101106] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 03/01/2022] [Accepted: 03/25/2022] [Indexed: 12/02/2022] Open
Abstract
During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated how performance changes across adolescent development in a stochastic, volatile reversal-learning task that uniquely taxes the balance of persistence and flexibility. In a sample of 291 participants aged 8–30, we found that in the mid-teen years, adolescents outperformed both younger and older participants. We developed two independent cognitive models, based on Reinforcement learning (RL) and Bayesian inference (BI). The RL parameter for learning from negative outcomes and the BI parameters specifying participants’ mental models were closest to optimal in mid-teen adolescents, suggesting a central role in adolescent cognitive processing. By contrast, persistence and noise parameters improved monotonically with age. We distilled the insights of RL and BI using principal component analysis and found that three shared components interacted to form the adolescent performance peak: adult-like behavioral quality, child-like time scales, and developmentally-unique processing of positive feedback. This research highlights adolescence as a neurodevelopmental window that can create performance advantages in volatile and uncertain environments. It also shows how detailed insights can be gleaned by using cognitive models in new ways.
Collapse
|
6
|
Abstract
A hallmark of adaptation in humans and other animals is our ability to control how we think and behave across different settings. Research has characterized the various forms cognitive control can take-including enhancement of goal-relevant information, suppression of goal-irrelevant information, and overall inhibition of potential responses-and has identified computations and neural circuits that underpin this multitude of control types. Studies have also identified a wide range of situations that elicit adjustments in control allocation (e.g., those eliciting signals indicating an error or increased processing conflict), but the rules governing when a given situation will give rise to a given control adjustment remain poorly understood. Significant progress has recently been made on this front by casting the allocation of control as a decision-making problem. This approach has developed unifying and normative models that prescribe when and how a change in incentives and task demands will result in changes in a given form of control. Despite their successes, these models, and the experiments that have been developed to test them, have yet to face their greatest challenge: deciding how to select among the multiplicity of configurations that control can take at any given time. Here, we will lay out the complexities of the inverse problem inherent to cognitive control allocation, and their close parallels to inverse problems within motor control (e.g., choosing between redundant limb movements). We discuss existing solutions to motor control's inverse problems drawn from optimal control theory, which have proposed that effort costs act to regularize actions and transform motor planning into a well-posed problem. These same principles may help shed light on how our brains optimize over complex control configuration, while providing a new normative perspective on the origins of mental effort.
Collapse
|
7
|
Abstract
Recent breakthroughs in artificial intelligence (AI) have enabled machines to plan in tasks previously thought to be uniquely human. Meanwhile, the planning algorithms implemented by the brain itself remain largely unknown. Here, we review neural and behavioral data in sequential decision-making tasks that elucidate the ways in which the brain does-and does not-plan. To systematically review available biological data, we create a taxonomy of planning algorithms by summarizing the relevant design choices for such algorithms in AI. Across species, recording techniques, and task paradigms, we find converging evidence that the brain represents future states consistent with a class of planning algorithms within our taxonomy-focused, depth-limited, and serial. However, we argue that current data are insufficient for addressing more detailed algorithmic questions. We propose a new approach leveraging AI advances to drive experiments that can adjudicate between competing candidate algorithms.
Collapse
|
8
|
Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology 2022; 47:104-118. [PMID: 34453117 PMCID: PMC8617262 DOI: 10.1038/s41386-021-01126-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 07/14/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023]
Abstract
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understanding of prefrontal cortex function. We discuss how such models have advanced from their origins in basic algorithms of updating and action selection to increasingly account for complexities in the cognitive processes required for learning and decision-making, and the representations over which they operate. We further discuss how a deeper understanding of the real-world complexities in these computations has shed light on the fundamental constraints on optimal behavior, and on the complex interactions between corticostriatal pathways to determine such behavior. The continuing and rapid development of these models holds great promise for understanding the mechanisms by which animals adapt to their environments, and what leads to maladaptive forms of learning and decision-making within clinical populations.
Collapse
|
9
|
Neuronal origins of reduced accuracy and biases in economic choices under sequential offers. eLife 2022; 11:75910. [PMID: 35416775 PMCID: PMC9045815 DOI: 10.7554/elife.75910] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 04/08/2022] [Indexed: 02/03/2023] Open
Abstract
Economic choices are characterized by a variety of biases. Understanding their origins is a long-term goal for neuroeconomics, but progress on this front has been limited. Here, we examined choice biases observed when two goods are offered sequentially. In the experiments, rhesus monkeys chose between different juices offered simultaneously or in sequence. Choices under sequential offers were less accurate (higher variability). They were also biased in favor of the second offer (order bias) and in favor of the preferred juice (preference bias). Analysis of neuronal activity recorded in the orbitofrontal cortex revealed that these phenomena emerged at different computational stages. Lower choice accuracy reflected weaker offer value signals (valuation stage), the order bias emerged during value comparison (decision stage), and the preference bias emerged late in the trial (post-comparison). By neuronal measures, each phenomenon reduced the value obtained on average in each trial and was thus costly to the monkey.
Collapse
|
10
|
Examining the effect of depressive symptoms on habit formation and habit-breaking. J Behav Ther Exp Psychiatry 2021; 73:101676. [PMID: 34298256 DOI: 10.1016/j.jbtep.2021.101676] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 05/16/2021] [Accepted: 07/17/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND AND OBJECTIVES Dysfunction in reward processing is a hallmark feature of depression. In the context of reinforcement learning, previous research has linked depression with reliance on simple habit-driven ('model-free') learning strategies over more complex, goal-directed ('model-based') strategies. However, the relationship between depression and habit-breaking remains an under-explored research area. The current study sought to bridge this gap by investigating the effect of depressive symptoms on habit formation and habit-breaking under monetary and social feedback conditions. Additionally, we examined whether spontaneous eyeblink rate (EBR), an indirect marker for striatal dopamine levels, would modulate such effects. METHODS Depressive symptoms were operationalized using self-report measures. To examine differences in habit formation and habit breaking, undergraduate participants (N = 156) completed a two-stage reinforcement learning task with a devaluation procedure using either monetary or social feedback. RESULTS Regression results showed that in the monetary feedback condition, spontaneous EBR moderated the relationship between depressive symptoms and model-free strategies; individuals with more depressive symptomatology and high EBR (higher dopamine levels) exhibited increased reliance on model-free strategies. Depressive symptoms negatively predicted devaluation sensitivity, indicative of difficulty in habit-breaking, in both monetary and social feedback contexts. LIMITATIONS Social feedback relied on fixed feedback rather than real-time peer evaluations; depressive symptoms were measured using self-report rather than diagnostic criteria for Major Depressive Disorder; dopaminergic functioning was measured using EBR rather than PET imaging; potential confounds were not controlled for. CONCLUSIONS These findings have implications for identifying altered patterns of habit formation and deficits in habit-breaking among those experiencing depressive symptoms.
Collapse
|
11
|
Mixing memory and desire: How memory reactivation supports deliberative decision-making. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1581. [PMID: 34665529 DOI: 10.1002/wcs.1581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 08/24/2021] [Accepted: 09/16/2021] [Indexed: 11/09/2022]
Abstract
Memories affect nearly every aspect of our mental life. They allow us to both resolve uncertainty in the present and to construct plans for the future. Recently, renewed interest in the role memory plays in adaptive behavior has led to new theoretical advances and empirical observations. We review key findings, with particular emphasis on how the retrieval of many kinds of memories affects deliberative action selection. These results are interpreted in a sequential inference framework, in which reinstatements from memory serve as "samples" of potential action outcomes. The resulting model suggests a central role for the dynamics of memory reactivation in determining the influence of different kinds of memory in decisions. We propose that representation-specific dynamics can implement a bottom-up "product of experts" rule that integrates multiple sets of action-outcome predictions weighted based on their uncertainty. We close by reviewing related findings and identifying areas for further research. This article is categorized under: Psychology > Reasoning and Decision Making Neuroscience > Cognition Neuroscience > Computation.
Collapse
|
12
|
The psychology of ultimate values: A computational perspective. JOURNAL FOR THE THEORY OF SOCIAL BEHAVIOUR 2021. [DOI: 10.1111/jtsb.12311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
|
14
|
Abstract
An instrumental action can be goal-directed after a moderate amount of practice and then convert to habit after more extensive practice. Recent evidence suggests, however, that habits can return to action status after different environmental manipulations. The present experiments therefore asked whether habit learning interferes with goal direction in a context-dependent manner like other types of retroactive interference (e.g., extinction, punishment, counterconditioning). In Experiment 1, rats were given a moderate amount of instrumental training to form an action in one context (Context A) and then more extended training of the same response to form a habit in another context (Context B). We then performed reinforcer devaluation with taste aversion conditioning in both contexts, and tested the response in both contexts. The response remained habitual in Context B, but was goal-directed in Context A, indicating renewal of goal direction after habit learning. Experiment 2 expanded on Experiment 1 by testing the response in a third context (Context C). It found that the habitual response also renewed as action in this context. Together, the results establish a parallel between habit and extinction learning: Conversion to habit does not destroy action knowledge, but interferes with it in a context-specific way. They are also consistent with other results suggesting that habit is specific to the context in which it is learned, whereas goal-direction can transfer between contexts. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
15
|
How higher goals are constructed and collapse under stress: A hierarchical Bayesian control systems perspective. Neurosci Biobehav Rev 2021; 123:257-285. [PMID: 33497783 DOI: 10.1016/j.neubiorev.2020.12.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 11/19/2020] [Accepted: 12/19/2020] [Indexed: 01/26/2023]
Abstract
In this paper, we show that organisms can be modeled as hierarchical Bayesian control systems with small world and information bottleneck (bow-tie) network structure. Such systems combine hierarchical perception with hierarchical goal setting and hierarchical action control. We argue that hierarchical Bayesian control systems produce deep hierarchies of goal states, from which it follows that organisms must have some form of 'highest goals'. For all organisms, these involve internal (self) models, external (social) models and overarching (normative) models. We show that goal hierarchies tend to decompose in a top-down manner under severe and prolonged levels of stress. This produces behavior that favors short-term and self-referential goals over long term, social and/or normative goals. The collapse of goal hierarchies is universally accompanied by an increase in entropy (disorder) in control systems that can serve as an early warning sign for tipping points (disease or death of the organism). In humans, learning goal hierarchies corresponds to personality development (maturation). The failure of goal hierarchies to mature properly corresponds to personality deficits. A top-down collapse of such hierarchies under stress is identified as a common factor in all forms of episodic mental disorders (psychopathology). The paper concludes by discussing ways of testing these hypotheses empirically.
Collapse
|
16
|
Neural Mechanisms of Human Decision-Making. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2021; 21:35-57. [PMID: 33409958 DOI: 10.3758/s13415-020-00842-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/28/2020] [Indexed: 11/08/2022]
Abstract
We present a theory and neural network model of the neural mechanisms underlying human decision-making. We propose a detailed model of the interaction between brain regions, under a proposer-predictor-actor-critic framework. This theory is based on detailed animal data and theories of action-selection. Those theories are adapted to serial operation to bridge levels of analysis and explain human decision-making. Task-relevant areas of cortex propose a candidate plan using fast, model-free, parallel neural computations. Other areas of cortex and medial temporal lobe can then predict likely outcomes of that plan in this situation. This optional prediction- (or model-) based computation can produce better accuracy and generalization, at the expense of speed. Next, linked regions of basal ganglia act to accept or reject the proposed plan based on its reward history in similar contexts. If that plan is rejected, the process repeats to consider a new option. The reward-prediction system acts as a critic to determine the value of the outcome relative to expectations and produce dopamine as a training signal for cortex and basal ganglia. By operating sequentially and hierarchically, the same mechanisms previously proposed for animal action-selection could explain the most complex human plans and decisions. We discuss explanations of model-based decisions, habitization, and risky behavior based on the computational model.
Collapse
|
17
|
Transfer of information across repeated decisions in general and in obsessive-compulsive disorder. Proc Natl Acad Sci U S A 2021; 118:2014271117. [PMID: 33443150 DOI: 10.1073/pnas.2014271117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Real-life decisions are often repeated. Whether considering taking a job in a new city, or doing something mundane like checking if the stove is off, decisions are frequently revisited even if no new information is available. This mode of behavior takes a particularly pathological form in obsessive-compulsive disorder (OCD), which is marked by individuals' redeliberating previously resolved decisions. Surprisingly, little is known about how information is transferred across decision episodes in such circumstances, and whether and how such transfer varies in OCD. In two experiments, data from a repeated decision-making task and computational modeling revealed that both implicit and explicit memories of previous decisions affected subsequent decisions by biasing the rate of evidence integration. Further, we replicated previous work demonstrating impairments in baseline decision-making as a function of self-reported OCD symptoms, and found that information transfer effects specifically due to implicit memory were reduced, offering computational insight into checking behavior.
Collapse
|
18
|
Tea With Milk? A Hierarchical Generative Framework of Sequential Event Comprehension. Top Cogn Sci 2021; 13:256-298. [PMID: 33025701 PMCID: PMC7897219 DOI: 10.1111/tops.12518] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 07/11/2020] [Accepted: 07/11/2020] [Indexed: 10/23/2022]
Abstract
To make sense of the world around us, we must be able to segment a continual stream of sensory inputs into discrete events. In this review, I propose that in order to comprehend events, we engage hierarchical generative models that "reverse engineer" the intentions of other agents as they produce sequential action in real time. By generating probabilistic predictions for upcoming events, generative models ensure that we are able to keep up with the rapid pace at which perceptual inputs unfold. By tracking our certainty about other agents' goals and the magnitude of prediction errors at multiple temporal scales, generative models enable us to detect event boundaries by inferring when a goal has changed. Moreover, by adapting flexibly to the broader dynamics of the environment and our own comprehension goals, generative models allow us to optimally allocate limited resources. Finally, I argue that we use generative models not only to comprehend events but also to produce events (carry out goal-relevant sequential action) and to continually learn about new events from our surroundings. Taken together, this hierarchical generative framework provides new insights into how the human brain processes events so effortlessly while highlighting the fundamental links between event comprehension, production, and learning.
Collapse
|
19
|
A generative spiking neural-network model of goal-directed behaviour and one-step planning. PLoS Comput Biol 2020; 16:e1007579. [PMID: 33290414 PMCID: PMC7748287 DOI: 10.1371/journal.pcbi.1007579] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 12/18/2020] [Accepted: 10/01/2020] [Indexed: 11/21/2022] Open
Abstract
In mammals, goal-directed and planning processes support flexible behaviour used to face new situations that cannot be tackled through more efficient but rigid habitual behaviours. Within the Bayesian modelling approach of brain and behaviour, models have been proposed to perform planning as probabilistic inference but this approach encounters a crucial problem: explaining how such inference might be implemented in brain spiking networks. Recently, the literature has proposed some models that face this problem through recurrent spiking neural networks able to internally simulate state trajectories, the core function at the basis of planning. However, the proposed models have relevant limitations that make them biologically implausible, namely their world model is trained ‘off-line’ before solving the target tasks, and they are trained with supervised learning procedures that are biologically and ecologically not plausible. Here we propose two novel hypotheses on how brain might overcome these problems, and operationalise them in a novel architecture pivoting on a spiking recurrent neural network. The first hypothesis allows the architecture to learn the world model in parallel with its use for planning: to this purpose, a new arbitration mechanism decides when to explore, for learning the world model, or when to exploit it, for planning, based on the entropy of the world model itself. The second hypothesis allows the architecture to use an unsupervised learning process to learn the world model by observing the effects of actions. The architecture is validated by reproducing and accounting for the learning profiles and reaction times of human participants learning to solve a visuomotor learning task that is new for them. Overall, the architecture represents the first instance of a model bridging probabilistic planning and spiking-processes that has a degree of autonomy analogous to the one of real organisms. Goal-directed behaviour relies on brain processes supporting planning of actions based on their expected consequences before performing them in the environment. An important computational modelling approach proposes that the brain performs goal-directed processes on the basis of probability distributions and computations on them. A key challenge of this approach is to explain how these probabilistic processes can rely on the spiking processes of the brain. The literature has recently proposed some models that do so by ‘thinking ahead’ alternative possible action-outcomes based on low-level neuronal stochastic events. However, these models have a limited autonomy as they require to learn how the environment works (‘world model’) before solving the tasks, and use a biologically implausible learning process requiring an ‘external teacher’ to tell how their internal units should respond. Here we present a novel architecture proposing how organisms might overcome these challenging problems. First, the architecture can decide if exploring, to learn the world model, or planning, using such model, by evaluating how confident it is on the model knowledge. Second, the architecture can autonomously learn the world model based on experience. The architecture represents a first fully autonomous planning model relying on a spiking neural network.
Collapse
|
20
|
Instrumental behavior in humans is sensitive to the correlation between response rate and reward rate. Psychon Bull Rev 2020; 28:649-656. [PMID: 33258082 DOI: 10.3758/s13423-020-01830-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/13/2020] [Indexed: 11/08/2022]
Abstract
Recent theories of instrumental behavior postulate that the correlation between response and reward rate is a critical factor in instrumental goal-directed performance. However, it is still not clear whether human actions can be sensitive to rate correlation. Using a novel within-subject design, participants were trained under ratio and interval contingencies of reinforcement matching both reward probabilities and reward rates between conditions. The impact of rate correlation on performance was evident in the higher performance observed under ratio contingencies for both types of matching. Moreover, there was no difference in performance between two classes of interval schedules with equivalent correlational properties but different reward probabilities. These results are discussed in terms of a recent dual-system model of instrumental behavior.
Collapse
|
21
|
Affect-biased attention and predictive processing. Cognition 2020; 203:104370. [DOI: 10.1016/j.cognition.2020.104370] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 05/22/2020] [Accepted: 06/03/2020] [Indexed: 01/22/2023]
|
22
|
Abstract
This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between a desired reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions. In the brain, chemicals such as dopamine allow nerve cells to ‘talk’ to each other and to relay information from and to the environment. Dopamine, in particular, is released when pleasant surprises are experienced: this helps the organism to learn about the consequences of certain actions. If a new flavour of ice-cream tastes better than expected, for example, the release of dopamine tells the brain that this flavour is worth choosing again. However, dopamine has an additional role in controlling movement. When the cells that produce dopamine die, for instance in Parkinson’s disease, individuals may find it difficult to initiate deliberate movements. Here, Rafal Bogacz aimed to develop a comprehensive framework that could reconcile the two seemingly unrelated roles played by dopamine. The new theory proposes that dopamine is released when an outcome differs from expectations, which helps the organism to adjust and minimise these differences. In the ice-cream example, the difference is between how good the treat is expected to taste, and how tasty it really is. By learning to select the same flavour repeatedly, the brain aligns expectation and the result of the choice. This ability would also apply when movements are planned. In this case, the brain compares the desired reward with the predicted results of the planned actions. For example, while planning to get a spoonful of ice-cream, the brain compares the pleasure expected from the movement that is currently planned, and the pleasure of eating a full spoon of the treat. If the two differ, for example because no movement has been planned yet, the brain releases dopamine to form a better version of the action plan. The theory was then tested using a computer simulation of nerve cells that release dopamine; this showed that the behaviour of the virtual cells closely matched that of their real-life counterparts. This work offers a comprehensive description of the fundamental role of dopamine in the brain. The model now needs to be verified through experiments on living nerve cells; ultimately, it could help doctors and researchers to develop better treatments for conditions such as Parkinson’s disease or ADHD, which are linked to a lack of dopamine.
Collapse
|
23
|
An Investigation of the Free Energy Principle for Emotion Recognition. Front Comput Neurosci 2020; 14:30. [PMID: 32390817 PMCID: PMC7189749 DOI: 10.3389/fncom.2020.00030] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 03/23/2020] [Indexed: 01/23/2023] Open
Abstract
This paper offers a prospectus of what might be achievable in the development of emotional recognition devices. It provides a conceptual overview of the free energy principle; including Markov blankets, active inference, and-in particular-a discussion of selfhood and theory of mind, followed by a brief explanation of how these concepts can explain both neural and cultural models of emotional inference. The underlying hypothesis is that emotion recognition and inference devices will evolve from state-of-the-art deep learning models into active inference schemes that go beyond marketing applications and become adjunct to psychiatric practice. Specifically, this paper proposes that a second wave of emotion recognition devices will be equipped with an emotional lexicon (or the ability to epistemically search for one), allowing the device to resolve uncertainty about emotional states by actively eliciting responses from the user and learning from these responses. Following this, a third wave of emotional devices will converge upon the user's generative model, resulting in the machine and human engaging in a reciprocal, prosocial emotional interaction, i.e., sharing a generative model of emotional states.
Collapse
|
24
|
Abstract
Economic choices entail computing and comparing subjective values. Evidence from primates indicates that this behavior relies on the orbitofrontal cortex. Conversely, previous work in rodents provided conflicting results. Here we present a mouse model of economic choice behavior, and we show that the lateral orbital (LO) area is intimately related to the decision process. In the experiments, mice chose between different juices offered in variable amounts. Choice patterns closely resembled those measured in primates. Optogenetic inactivation of LO dramatically disrupted choices by inducing erratic changes of relative value and by increasing choice variability. Neuronal recordings revealed that different groups of cells encoded the values of individual options, the binary choice outcome and the chosen value. These groups match those previously identified in primates, except that the neuronal representation in mice is spatial (in monkeys it is good-based). Our results lay the foundations for a circuit-level analysis of economic decisions.
Collapse
|
25
|
Economic Decisions through Circuit Inhibition. Curr Biol 2019; 29:3814-3824.e5. [PMID: 31679936 DOI: 10.1016/j.cub.2019.09.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 09/04/2019] [Accepted: 09/11/2019] [Indexed: 11/21/2022]
Abstract
Economic choices between goods are thought to rely on the orbitofrontal cortex (OFC), but the decision mechanisms remain poorly understood. To shed light on this fundamental issue, we recorded from the OFC of monkeys choosing between two juices offered sequentially. An analysis of firing rates across time windows revealed the presence of different groups of neurons similar to those previously identified under simultaneous offers. This observation suggested that economic decisions in the two modalities are formed in the same neural circuit. We then examined several hypotheses on the decision mechanisms. OFC neurons encoded good identities and values in a juice-based representation (labeled lines). Contrary to previous assessments, our data argued against the idea that decisions rely on mutual inhibition at the level of offer values. In fact, we showed that previous arguments for mutual inhibition were confounded by differences in value ranges. Instead, decisions seemed to involve mechanisms of circuit inhibition, whereby each offer value indirectly inhibited neurons encoding the opposite choice outcome. Our results reconcile a variety of previous findings and provide a general account for the neuronal underpinnings of economic choices.
Collapse
|
26
|
Categorical encoding of decision variables in orbitofrontal cortex. PLoS Comput Biol 2019; 15:e1006667. [PMID: 31609973 PMCID: PMC6812845 DOI: 10.1371/journal.pcbi.1006667] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 10/24/2019] [Accepted: 09/02/2019] [Indexed: 11/18/2022] Open
Abstract
A fundamental and recurrent question in systems neuroscience is that of assessing what variables are encoded by a given population of neurons. Such assessments are often challenging because neurons in one brain area may encode multiple variables, and because neuronal representations might be categorical or non-categorical. These issues are particularly pertinent to the representation of decision variables in the orbitofrontal cortex (OFC)-an area implicated in economic choices. Here we present a new algorithm to assess whether a neuronal representation is categorical or non-categorical, and to identify the encoded variables if the representation is indeed categorical. The algorithm is based on two clustering procedures, one variable-independent and the other variable-based. The two partitions are then compared through adjusted mutual information. The present algorithm overcomes limitations of previous approaches and is widely applicable. We tested the algorithm on synthetic data and then used it to examine neuronal data recorded in the primate OFC during economic decisions. Confirming previous assessments, we found the neuronal representation in OFC to be categorical in nature. We also found that neurons in this area encode the value of individual offers, the binary choice outcome and the chosen value. In other words, during economic choice, neurons in the primate OFC encode decision variables in a categorical way.
Collapse
|
27
|
Loss Aversion Correlates With the Propensity to Deploy Model-Based Control. Front Neurosci 2019; 13:915. [PMID: 31555082 PMCID: PMC6743018 DOI: 10.3389/fnins.2019.00915] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 08/16/2019] [Indexed: 11/13/2022] Open
Abstract
Reward-based decision making is thought to be driven by at least two different types of decision systems: a simple stimulus–response cache-based system which embodies the common-sense notion of “habit,” for which model-free reinforcement learning serves as a computational substrate, and a more deliberate, prospective, model-based planning system. Previous work has shown that loss aversion, a well-studied measure of how much more on average individuals weigh losses relative to gains during decision making, is reduced when participants take all possible decisions and outcomes into account including future ones, relative to when they myopically focus on the current decision. Model-based control offers a putative mechanism for implementing such foresight. Using a well-powered data set (N = 117) in which participants completed two different tasks designed to measure each of the two quantities of interest, and four models of choice data for these tasks, we found consistent evidence of a relationship between loss aversion and model-based control but in the direction opposite to that expected based on previous work: loss aversion had a positive relationship with model-based control. We did not find evidence for a relationship between either decision system and risk aversion, a related aspect of subjective utility.
Collapse
|
28
|
Abstract
Rationality principles such as optimal feedback control and Bayesian inference underpin a probabilistic framework that has accounted for a range of empirical phenomena in biological sensorimotor control. To facilitate the optimization of flexible and robust behaviors consistent with these theories, the ability to construct internal models of the motor system and environmental dynamics can be crucial. In the context of this theoretic formalism, we review the computational roles played by such internal models and the neural and behavioral evidence for their implementation in the brain.
Collapse
|
29
|
Regimes of Expectations: An Active Inference Model of Social Conformity and Human Decision Making. Front Psychol 2019; 10:679. [PMID: 30988668 PMCID: PMC6452780 DOI: 10.3389/fpsyg.2019.00679] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 03/11/2019] [Indexed: 01/06/2023] Open
Abstract
How do humans come to acquire shared expectations about how they ought to behave in distinct normalized social settings? This paper offers a normative framework to answer this question. We introduce the computational construct of 'deontic value' - based on active inference and Markov decision processes - to formalize conceptions of social conformity and human decision-making. Deontic value is an attribute of choices, behaviors, or action sequences that inherit directly from deontic cues in our econiche (e.g., red traffic lights); namely, cues that denote an obligatory social rule. Crucially, the prosocial aspect of deontic value rests upon a particular form of circular causality: deontic cues exist in the environment in virtue of the environment being modified by repeated actions, while action itself is contingent upon the deontic value of environmental cues. We argue that this construction of deontic cues enables the epistemic (i.e., information-seeking) and pragmatic (i.e., goal- seeking) values of any behavior to be 'cached' or 'outsourced' to the environment, where the environment effectively 'learns' about the behavior of its denizens. We describe the process whereby this particular aspect of value enables learning of habitual behavior over neurodevelopmental and transgenerational timescales.
Collapse
|
30
|
Looking for Mr(s) Right: Decision bias can prevent us from finding the most attractive face. Cogn Psychol 2019; 111:1-14. [PMID: 30826584 DOI: 10.1016/j.cogpsych.2019.02.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 12/21/2018] [Accepted: 02/22/2019] [Indexed: 01/28/2023]
Abstract
In realistic and challenging decision contexts, people may show biases that prevent them from choosing their favored options. For example, astronomer Johannes Kepler famously interviewed several candidate fiancées sequentially, but was rejected when attempting to return to a previous candidate. Similarly, we examined human performance on searches for attractive faces through fixed-length sequences by adapting optimal stopping computational theory developed from behavioral ecology and economics. Although economics studies have repeatedly found that participants sample too few options before choosing the best-ranked number from a series, we instead found overlong searches with many sequences ending without choice. Participants employed irrationally high choice thresholds, compared to the more lax, realistic standards of a Bayesian ideal observer, which achieved better-ranked faces. We consider several computational accounts and find that participants most resemble a Bayesian model that decides based on altered attractiveness values. These values may produce starkly different biases in the facial attractiveness domain than in other decision domains.
Collapse
|
31
|
Advantage of prediction and mental imagery for goal‐directed behaviour in agents and robots. COGNITIVE COMPUTATION AND SYSTEMS 2019. [DOI: 10.1049/ccs.2018.0002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
32
|
Abstract
Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning mechanisms, which typically select between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
33
|
Model-based spatial navigation in the hippocampus-ventral striatum circuit: A computational analysis. PLoS Comput Biol 2018; 14:e1006316. [PMID: 30222746 PMCID: PMC6160242 DOI: 10.1371/journal.pcbi.1006316] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Revised: 09/27/2018] [Accepted: 06/20/2018] [Indexed: 12/26/2022] Open
Abstract
While the neurobiology of simple and habitual choices is relatively well known, our current understanding of goal-directed choices and planning in the brain is still limited. Theoretical work suggests that goal-directed computations can be productively associated to model-based (reinforcement learning) computations, yet a detailed mapping between computational processes and neuronal circuits remains to be fully established. Here we report a computational analysis that aligns Bayesian nonparametrics and model-based reinforcement learning (MB-RL) to the functioning of the hippocampus (HC) and the ventral striatum (vStr)-a neuronal circuit that increasingly recognized to be an appropriate model system to understand goal-directed (spatial) decisions and planning mechanisms in the brain. We test the MB-RL agent in a contextual conditioning task that depends on intact hippocampus and ventral striatal (shell) function and show that it solves the task while showing key behavioral and neuronal signatures of the HC-vStr circuit. Our simulations also explore the benefits of biological forms of look-ahead prediction (forward sweeps) during both learning and control. This article thus contributes to fill the gap between our current understanding of computational algorithms and biological realizations of (model-based) reinforcement learning.
Collapse
|
34
|
Manipulating the revision of reward value during the intertrial interval increases sign tracking and dopamine release. PLoS Biol 2018; 16:e2004015. [PMID: 30256785 PMCID: PMC6175531 DOI: 10.1371/journal.pbio.2004015] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 10/08/2018] [Accepted: 09/11/2018] [Indexed: 11/29/2022] Open
Abstract
Recent computational models of sign tracking (ST) and goal tracking (GT) have accounted for observations that dopamine (DA) is not necessary for all forms of learning and have provided a set of predictions to further their validity. Among these, a central prediction is that manipulating the intertrial interval (ITI) during autoshaping should change the relative ST-GT proportion as well as DA phasic responses. Here, we tested these predictions and found that lengthening the ITI increased ST, i.e., behavioral engagement with conditioned stimuli (CS) and cue-induced phasic DA release. Importantly, DA release was also present at the time of reward delivery, even after learning, and DA release was correlated with time spent in the food cup during the ITI. During conditioning with shorter ITIs, GT was prominent (i.e., engagement with food cup), and DA release responded to the CS while being absent at the time of reward delivery after learning. Hence, shorter ITIs restored the classical DA reward prediction error (RPE) pattern. These results validate the computational hypotheses, opening new perspectives on the understanding of individual differences in Pavlovian conditioning and DA signaling.
Collapse
|
35
|
Planning and navigation as active inference. BIOLOGICAL CYBERNETICS 2018; 112:323-343. [PMID: 29572721 PMCID: PMC6060791 DOI: 10.1007/s00422-018-0753-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 03/07/2018] [Indexed: 05/05/2023]
Abstract
This paper introduces an active inference formulation of planning and navigation. It illustrates how the exploitation-exploration dilemma is dissolved by acting to minimise uncertainty (i.e. expected surprise or free energy). We use simulations of a maze problem to illustrate how agents can solve quite complicated problems using context sensitive prior preferences to form subgoals. Our focus is on how epistemic behaviour-driven by novelty and the imperative to reduce uncertainty about the world-contextualises pragmatic or goal-directed behaviour. Using simulations, we illustrate the underlying process theory with synthetic behavioural and electrophysiological responses during exploration of a maze and subsequent navigation to a target location. An interesting phenomenon that emerged from the simulations was a putative distinction between 'place cells'-that fire when a subgoal is reached-and 'path cells'-that fire until a subgoal is reached.
Collapse
|
36
|
Abstract
When modeling goal-directed behavior in the presence of various sources of uncertainty, planning can be described as an inference process. A solution to the problem of planning as inference was previously proposed in the active inference framework in the form of an approximate inference scheme based on variational free energy. However, this approximate scheme was based on the mean-field approximation, which assumes statistical independence of hidden variables and is known to show overconfidence and may converge to local minima of the free energy. To better capture the spatiotemporal properties of an environment, we reformulated the approximate inference process using the so-called Bethe approximation. Importantly, the Bethe approximation allows for representation of pairwise statistical dependencies. Under these assumptions, the minimizer of the variational free energy corresponds to the belief propagation algorithm, commonly used in machine learning. To illustrate the differences between the mean-field approximation and the Bethe approximation, we have simulated agent behavior in a simple goal-reaching task with different types of uncertainties. Overall, the Bethe agent achieves higher success rates in reaching goal states. We relate the better performance of the Bethe agent to more accurate predictions about the consequences of its own actions. Consequently, active inference based on the Bethe approximation extends the application range of active inference to more complex behavioral tasks.
Collapse
|
37
|
Orbitofrontal Cortex: A Neural Circuit for Economic Decisions. Neuron 2017; 96:736-754. [PMID: 29144973 PMCID: PMC5726577 DOI: 10.1016/j.neuron.2017.09.031] [Citation(s) in RCA: 137] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 09/14/2017] [Accepted: 09/20/2017] [Indexed: 11/24/2022]
Abstract
Economic choice behavior entails the computation and comparison of subjective values. A central contribution of neuroeconomics has been to show that subjective values are represented explicitly at the neuronal level. With this result at hand, the field has increasingly focused on the difficult question of where in the brain and how exactly subjective values are compared to make a decision. Here, we review a broad range of experimental and theoretical results suggesting that good-based decisions are generated in a neural circuit within the orbitofrontal cortex (OFC). The main lines of evidence supporting this proposal include the fact that goal-directed behavior is specifically disrupted by OFC lesions, the fact that different groups of neurons in this area encode the input and the output of the decision process, the fact that activity fluctuations in each of these cell groups correlate with choice variability, and the fact that these groups of neurons are computationally sufficient to generate decisions. Results from other brain regions are consistent with the idea that good-based decisions take place in OFC and indicate that value signals inform a variety of mental functions. We also contrast the present proposal with other leading models for the neural mechanisms of economic decisions. Finally, we indicate open questions and suggest possible directions for future research.
Collapse
|
38
|
Continuous track paths reveal additive evidence integration in multistep decision making. Proc Natl Acad Sci U S A 2017; 114:10618-10623. [PMID: 28923918 DOI: 10.1073/pnas.1710913114] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Multistep decision making pervades daily life, but its underlying mechanisms remain obscure. We distinguish four prominent models of multistep decision making, namely serial stage, hierarchical evidence integration, hierarchical leaky competing accumulation (HLCA), and probabilistic evidence integration (PEI). To empirically disentangle these models, we design a two-step reward-based decision paradigm and implement it in a reaching task experiment. In a first step, participants choose between two potential upcoming choices, each associated with two rewards. In a second step, participants choose between the two rewards selected in the first step. Strikingly, as predicted by the HLCA and PEI models, the first-step decision dynamics were initially biased toward the choice representing the highest sum/mean before being redirected toward the choice representing the maximal reward (i.e., initial dip). Only HLCA and PEI predicted this initial dip, suggesting that first-step decision dynamics depend on additive integration of competing second-step choices. Our data suggest that potential future outcomes are progressively unraveled during multistep decision making.
Collapse
|
39
|
A unifying Bayesian account of contextual effects in value-based choice. PLoS Comput Biol 2017; 13:e1005769. [PMID: 28981514 PMCID: PMC5645156 DOI: 10.1371/journal.pcbi.1005769] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 10/17/2017] [Accepted: 09/11/2017] [Indexed: 11/18/2022] Open
Abstract
Empirical evidence suggests the incentive value of an option is affected by other options available during choice and by options presented in the past. These contextual effects are hard to reconcile with classical theories and have inspired accounts where contextual influences play a crucial role. However, each account only addresses one or the other of the empirical findings and a unifying perspective has been elusive. Here, we offer a unifying theory of context effects on incentive value attribution and choice based on normative Bayesian principles. This formulation assumes that incentive value corresponds to a precision-weighted prediction error, where predictions are based upon expectations about reward. We show that this scheme explains a wide range of contextual effects, such as those elicited by other options available during choice (or within-choice context effects). These include both conditions in which choice requires an integration of multiple attributes and conditions where a multi-attribute integration is not necessary. Moreover, the same scheme explains context effects elicited by options presented in the past or between-choice context effects. Our formulation encompasses a wide range of contextual influences (comprising both within- and between-choice effects) by calling on Bayesian principles, without invoking ad-hoc assumptions. This helps clarify the contextual nature of incentive value and choice behaviour and may offer insights into psychopathologies characterized by dysfunctional decision-making, such as addiction and pathological gambling.
Collapse
|
40
|
The Neural Basis of Aversive Pavlovian Guidance during Planning. J Neurosci 2017; 37:10215-10229. [PMID: 28924006 DOI: 10.1523/jneurosci.0085-17.2017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 06/19/2017] [Indexed: 01/21/2023] Open
Abstract
Important real-world decisions are often arduous as they frequently involve sequences of choices, with initial selections affecting future options. Evaluating every possible combination of choices is computationally intractable, particularly for longer multistep decisions. Therefore, humans frequently use heuristics to reduce the complexity of decisions. We recently used a goal-directed planning task to demonstrate the profound behavioral influence and ubiquity of one such shortcut, namely aversive pruning, a reflexive Pavlovian process that involves neglecting parts of the decision space residing beyond salient negative outcomes. However, how the brain implements this important decision heuristic and what underlies individual differences have hitherto remained unanswered. Therefore, we administered an adapted version of the same planning task to healthy male and female volunteers undergoing functional magnetic resonance imaging (fMRI) to determine the neural basis of aversive pruning. Through both computational and standard categorical fMRI analyses, we show that when planning was influenced by aversive pruning, the subgenual cingulate cortex was robustly recruited. This neural signature was distinct from those associated with general planning and valuation, two fundamental cognitive components elicited by our task but which are complementary to aversive pruning. Furthermore, we found that individual variation in levels of aversive pruning was associated with the responses of insula and dorsolateral prefrontal cortices to the receipt of large monetary losses, and also with subclinical levels of anxiety. In summary, our data reveal the neural signatures of an important reflexive Pavlovian process that shapes goal-directed evaluations and thereby determines the outcome of high-level sequential cognitive processes.SIGNIFICANCE STATEMENT Multistep decisions are complex because initial choices constrain future options. Evaluating every path for long decision sequences is often impractical; thus, cognitive shortcuts are often essential. One pervasive and powerful heuristic is aversive pruning, in which potential decision-making avenues are curtailed at immediate negative outcomes. We used neuroimaging to examine how humans implement such pruning. We found it to be associated with activity in the subgenual cingulate cortex, with neural signatures that were distinguishable from those covarying with planning and valuation. Individual variations in aversive pruning levels related to subclinical anxiety levels and insular cortex activation. These findings reveal the neural mechanisms by which basic negative Pavlovian influences guide decision-making during planning, with implications for disrupted decision-making in psychiatric disorders.
Collapse
|
41
|
Modeling Search Behaviors during the Acquisition of Expertise in a Sequential Decision-Making Task. Front Comput Neurosci 2017; 11:80. [PMID: 28943847 PMCID: PMC5596102 DOI: 10.3389/fncom.2017.00080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 08/04/2017] [Indexed: 11/13/2022] Open
Abstract
Our daily interaction with the world is plagued of situations in which we develop expertise through self-motivated repetition of the same task. In many of these interactions, and especially when dealing with computer and machine interfaces, we must deal with sequences of decisions and actions. For instance, when drawing cash from an ATM machine, choices are presented in a step-by-step fashion and a specific sequence of choices must be performed in order to produce the expected outcome. But, as we become experts in the use of such interfaces, is it possible to identify specific search and learning strategies? And if so, can we use this information to predict future actions? In addition to better understanding the cognitive processes underlying sequential decision making, this could allow building adaptive interfaces that can facilitate interaction at different moments of the learning curve. Here we tackle the question of modeling sequential decision-making behavior in a simple human-computer interface that instantiates a 4-level binary decision tree (BDT) task. We record behavioral data from voluntary participants while they attempt to solve the task. Using a Hidden Markov Model-based approach that capitalizes on the hierarchical structure of behavior, we then model their performance during the interaction. Our results show that partitioning the problem space into a small set of hierarchically related stereotyped strategies can potentially capture a host of individual decision making policies. This allows us to follow how participants learn and develop expertise in the use of the interface. Moreover, using a Mixture of Experts based on these stereotyped strategies, the model is able to predict the behavior of participants that master the task.
Collapse
|
42
|
Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput Biol 2017; 13:e1005768. [PMID: 28945743 PMCID: PMC5628940 DOI: 10.1371/journal.pcbi.1005768] [Citation(s) in RCA: 122] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 10/05/2017] [Accepted: 09/04/2017] [Indexed: 11/19/2022] Open
Abstract
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
Collapse
|
43
|
Abstract
This article offers a formal account of curiosity and insight in terms of active (Bayesian) inference. It deals with the dual problem of inferring states of the world and learning its statistical structure. In contrast to current trends in machine learning (e.g., deep learning), we focus on how people attain insight and understanding using just a handful of observations, which are solicited through curious behavior. We use simulations of abstract rule learning and approximate Bayesian inference to show that minimizing (expected) variational free energy leads to active sampling of novel contingencies. This epistemic behavior closes explanatory gaps in generative models of the world, thereby reducing uncertainty and satisfying curiosity. We then move from epistemic learning to model selection or structure learning to show how abductive processes emerge when agents test plausible hypotheses about symmetries (i.e., invariances or rules) in their generative models. The ensuing Bayesian model reduction evinces mechanisms associated with sleep and has all the hallmarks of "aha" moments. This formulation moves toward a computational account of consciousness in the pre-Cartesian sense of sharable knowledge (i.e., con: "together"; scire: "to know").
Collapse
|
44
|
Internally generated hippocampal sequences as a vantage point to probe future-oriented cognition. Ann N Y Acad Sci 2017; 1396:144-165. [PMID: 28548460 DOI: 10.1111/nyas.13329] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 01/31/2017] [Accepted: 02/07/2017] [Indexed: 12/22/2022]
Abstract
Information processing in the rodent hippocampus is fundamentally shaped by internally generated sequences (IGSs), expressed during two different network states: theta sequences, which repeat and reset at the ∼8 Hz theta rhythm associated with active behavior, and punctate sharp wave-ripple (SWR) sequences associated with wakeful rest or slow-wave sleep. A potpourri of diverse functional roles has been proposed for these IGSs, resulting in a fragmented conceptual landscape. Here, we advance a unitary view of IGSs, proposing that they reflect an inferential process that samples a policy from the animal's generative model, supported by hippocampus-specific priors. The same inference affords different cognitive functions when the animal is in distinct dynamical modes, associated with specific functional networks. Theta sequences arise when inference is coupled to the animal's action-perception cycle, supporting online spatial decisions, predictive processing, and episode encoding. SWR sequences arise when the animal is decoupled from the action-perception cycle and may support offline cognitive processing, such as memory consolidation, the prospective simulation of spatial trajectories, and imagination. We discuss the empirical bases of this proposal in relation to rodent studies and highlight how the proposed computational principles can shed light on the mechanisms of future-oriented cognition in humans.
Collapse
|
45
|
Independent Neural Computation of Value from Other People's Confidence. J Neurosci 2017; 37:673-684. [PMID: 28100748 DOI: 10.1523/jneurosci.4490-15.2016] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Revised: 11/04/2016] [Accepted: 12/01/2016] [Indexed: 11/21/2022] Open
Abstract
Expectation of reward can be shaped by the observation of actions and expressions of other people in one's environment. A person's apparent confidence in the likely reward of an action, for instance, makes qualities of their evidence, not observed directly, socially accessible. This strategy is computationally distinguished from associative learning methods that rely on direct observation, by its use of inference from indirect evidence. In twenty-three healthy human subjects, we isolated effects of first-hand experience, other people's choices, and the mediating effect of their confidence, on decision-making and neural correlates of value within ventromedial prefrontal cortex (vmPFC). Value derived from first-hand experience and other people's choices (regardless of confidence) were indiscriminately represented across vmPFC. However, value computed from agent choices weighted by their associated confidence was represented with specificity for ventromedial area 10. This pattern corresponds to shifts of connectivity and overlapping cognitive processes along a posterior-anterior vmPFC axis. Task behavior and self-reported self-reliance for decision-making in other social contexts correlated. The tendency to conform in other social contexts corresponded to increased activation in cortical regions previously shown to respond to social conflict in proportion to subsequent conformity (Campbell-Meiklejohn et al., 2010). The tendency to self-monitor predicted a selectively enhanced response to accordance with others in the right temporoparietal junction (rTPJ). The findings anatomically decompose vmPFC value representations according to computational requirements and provide biological insight into the social transmission of preference and reassurance gained from the confidence of others. SIGNIFICANCE STATEMENT Decades of research have provided evidence that the ventromedial prefrontal cortex (vmPFC) signals the satisfaction we expect from imminent actions. However, we have a surprisingly modest understanding of the organization of value across this substantial and varied region. This study finds that using cues of the reliability of other peoples' knowledge to enhance expectation of personal success generates value correlates that are anatomically distinct from those concurrently computed from direct, personal experience. This suggests that representation of decision values in vmPFC is suborganized according to the underlying computation, consistent with what we know about the anatomical heterogeneity of the region. These results also provide insight into the observational learning process by which someone else's confidence can sway and reassure our choices.
Collapse
|
46
|
|
47
|
Abstract
Stress, pervasive in modern society, impairs prefrontal cortex (PFC)-dependent cognitive processes, an action implicated in multiple psychopathologies and estimated to contribute to nearly half of all work place accidents. However, the neurophysiological bases for stress-related impairment of PFC-dependent function remain poorly understood. The current studies examined the effects of stress on PFC neural coding during a working memory task in rats. Stress suppressed responses of medial PFC (mPFC) neurons strongly tuned to a diversity of task events, including delay and outcome (reward, error). Stress-related impairment of task-related neuronal activity included multidimensional coding by PFC neurons, an action that significantly predicted cognitive impairment. Importantly, the effects of stress on PFC neuronal signaling were highly conditional on tuning strength: stress increased task-related activity in the larger population of PFC neurons weakly tuned to task events. Combined, stress elicits a profound collapse of task representations across the broader population of PFC neurons.
Collapse
|
48
|
Abstract
Categorization is a fundamental ability for efficient behavioral control. It allows organisms to remember the correct responses to categorical cues and not for every stimulus encountered (hence eluding computational cost or complexity), and to generalize appropriate responses to novel stimuli dependant on category assignment. Assuming the brain performs Bayesian inference, based on a generative model of the external world and future goals, we propose a computational model of categorization in which important properties emerge. These properties comprise the ability to infer latent causes of sensory experience, a hierarchical organization of latent causes, and an explicit inclusion of context and action representations. Crucially, these aspects derive from considering the environmental statistics that are relevant to achieve goals, and from the fundamental Bayesian principle that any generative model should be preferred over alternative models based on an accuracy-complexity trade-off. Our account is a step toward elucidating computational principles of categorization and its role within the Bayesian brain hypothesis.
Collapse
|
49
|
What to Choose Next? A Paradigm for Testing Human Sequential Decision Making. Front Psychol 2017; 8:312. [PMID: 28326050 PMCID: PMC5339299 DOI: 10.3389/fpsyg.2017.00312] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 02/20/2017] [Indexed: 11/13/2022] Open
Abstract
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs. model-based mechanisms). We show that the eligibility trace decays not with sheer time, but rather with the number of discrete decision steps made by the participants. We further show that, unexpectedly, neither monetary rewards nor the environment's spatial regularity significantly modulate behavioral performance. Finally, we found that model-free learning algorithms describe human performance better than model-based algorithms.
Collapse
|
50
|
A social Bayesian brain: How social knowledge can shape visual perception. Brain Cogn 2017; 112:69-77. [DOI: 10.1016/j.bandc.2016.05.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 04/06/2016] [Accepted: 05/10/2016] [Indexed: 01/25/2023]
|