1
|
Ishizu K, Nishimoto S, Ueoka Y, Funamizu A. Localized and global representation of prior value, sensory evidence, and choice in male mouse cerebral cortex. Nat Commun 2024; 15:4071. [PMID: 38778078 PMCID: PMC11111702 DOI: 10.1038/s41467-024-48338-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 04/26/2024] [Indexed: 05/25/2024] Open
Abstract
Adaptive behavior requires integrating prior knowledge of action outcomes and sensory evidence for making decisions while maintaining prior knowledge for future actions. As outcome- and sensory-based decisions are often tested separately, it is unclear how these processes are integrated in the brain. In a tone frequency discrimination task with two sound durations and asymmetric reward blocks, we found that neurons in the medial prefrontal cortex of male mice represented the additive combination of prior reward expectations and choices. The sensory inputs and choices were selectively decoded from the auditory cortex irrespective of reward priors and the secondary motor cortex, respectively, suggesting localized computations of task variables are required within single trials. In contrast, all the recorded regions represented prior values that needed to be maintained across trials. We propose localized and global computations of task variables in different time scales in the cerebral cortex.
Collapse
Affiliation(s)
- Kotaro Ishizu
- Institute for Quantitative Biosciences, University of Tokyo, Laboratory of Neural Computation, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan
| | - Shosuke Nishimoto
- Institute for Quantitative Biosciences, University of Tokyo, Laboratory of Neural Computation, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan
- Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, 3-8-2, Komaba, Meguro-ku, Tokyo, 153-8902, Japan
| | - Yutaro Ueoka
- Institute for Quantitative Biosciences, University of Tokyo, Laboratory of Neural Computation, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan
| | - Akihiro Funamizu
- Institute for Quantitative Biosciences, University of Tokyo, Laboratory of Neural Computation, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.
- Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, 3-8-2, Komaba, Meguro-ku, Tokyo, 153-8902, Japan.
| |
Collapse
|
2
|
Cleaveland JM. The active time model of concurrent choice. PLoS One 2024; 19:e0301173. [PMID: 38771859 PMCID: PMC11108226 DOI: 10.1371/journal.pone.0301173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 03/12/2024] [Indexed: 05/23/2024] Open
Abstract
The following paper describes a steady-state model of concurrent choice, termed the active time model (ATM). ATM is derived from maximization principles and is characterized by a semi-Markov process. The model proposes that the controlling stimulus in concurrent variable-interval (VI) VI schedules of reinforcement is the time interval since the most recent response, termed here "the active interresponse time" or simply "active time." In the model after a response is generated, it is categorized by a function that relates active times to switch/stay probabilities. In the paper the output of ATM is compared with predictions made by three other models of operant conditioning: melioration, a version of scalar expectancy theory (SET), and momentary maximization. Data sets considered include preferences in multiple-concurrent VI VI schedules, molecular choice patterns, correlations between switching and perseveration, and molar choice proportions. It is shown that ATM can account for all of these data sets, while the other models produce more limited fits. However, rather than argue that ATM is the singular model for concurrent VI VI choice, a consideration of its concept space leads to the conclusion that operant choice is multiply-determined, and that an adaptive viewpoint-one that considers experimental procedures both as selecting mechanisms for animal choice as well as tests of the controlling variables of that choice-is warranted.
Collapse
Affiliation(s)
- J. Mark Cleaveland
- Department of Psychological Science, Vassar College, Poughkeepsie, NY, United States of America
| |
Collapse
|
3
|
Dillon DG, Belleau EL, Origlio J, McKee M, Jahan A, Meyer A, Souther MK, Brunner D, Kuhn M, Ang YS, Cusin C, Fava M, Pizzagalli DA. Using Drift Diffusion and RL Models to Disentangle Effects of Depression On Decision-Making vs. Learning in the Probabilistic Reward Task. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:46-69. [PMID: 38774430 PMCID: PMC11104335 DOI: 10.5334/cpsy.108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 04/08/2024] [Indexed: 05/24/2024]
Abstract
The Probabilistic Reward Task (PRT) is widely used to investigate the impact of Major Depressive Disorder (MDD) on reinforcement learning (RL), and recent studies have used it to provide insight into decision-making mechanisms affected by MDD. The current project used PRT data from unmedicated, treatment-seeking adults with MDD to extend these efforts by: (1) providing a more detailed analysis of standard PRT metrics-response bias and discriminability-to better understand how the task is performed; (2) analyzing the data with two computational models and providing psychometric analyses of both; and (3) determining whether response bias, discriminability, or model parameters predicted responses to treatment with placebo or the atypical antidepressant bupropion. Analysis of standard metrics replicated recent work by demonstrating a dependency between response bias and response time (RT), and by showing that reward totals in the PRT are governed by discriminability. Behavior was well-captured by the Hierarchical Drift Diffusion Model (HDDM), which models decision-making processes; the HDDM showed excellent internal consistency and acceptable retest reliability. A separate "belief" model reproduced the evolution of response bias over time better than the HDDM, but its psychometric properties were weaker. Finally, the predictive utility of the PRT was limited by small samples; nevertheless, depressed adults who responded to bupropion showed larger pre-treatment starting point biases in the HDDM than non-responders, indicating greater sensitivity to the PRT's asymmetric reinforcement contingencies. Together, these findings enhance our understanding of reward and decision-making mechanisms that are implicated in MDD and probed by the PRT.
Collapse
Affiliation(s)
- Daniel G. Dillon
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Harvard Medical School, Boston MA, USA
| | - Emily L. Belleau
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Harvard Medical School, Boston MA, USA
| | - Julianne Origlio
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Madison McKee
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Aava Jahan
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Ashley Meyer
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Min Kang Souther
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
| | - Devon Brunner
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
| | - Manuel Kuhn
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Yuen Siang Ang
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
| | - Cristina Cusin
- Harvard Medical School, Boston MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Maurizio Fava
- Harvard Medical School, Boston MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| | - Diego A. Pizzagalli
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont MA, USA
- Depression Clinical and Research Program, Massachusetts General Hospital, Boston MA, USA
| |
Collapse
|
4
|
Alejandro RJ, Holroyd CB. Hierarchical control over foraging behavior by anterior cingulate cortex. Neurosci Biobehav Rev 2024; 160:105623. [PMID: 38490499 DOI: 10.1016/j.neubiorev.2024.105623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/14/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024]
Abstract
Foraging is a natural behavior that involves making sequential decisions to maximize rewards while minimizing the costs incurred when doing so. The prevalence of foraging across species suggests that a common brain computation underlies its implementation. Although anterior cingulate cortex is believed to contribute to foraging behavior, its specific role has been contentious, with predominant theories arguing either that it encodes environmental value or choice difficulty. Additionally, recent attempts to characterize foraging have taken place within the reinforcement learning framework, with increasingly complex models scaling with task complexity. Here we review reinforcement learning foraging models, highlighting the hierarchical structure of many foraging problems. We extend this literature by proposing that ACC guides foraging according to principles of model-based hierarchical reinforcement learning. This idea holds that ACC function is organized hierarchically along a rostral-caudal gradient, with rostral structures monitoring the status and completion of high-level task goals (like finding food), and midcingulate structures overseeing the execution of task options (subgoals, like harvesting fruit) and lower-level actions (such as grabbing an apple).
Collapse
Affiliation(s)
| | - Clay B Holroyd
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
5
|
Kang JU, Mooshagian E, Snyder LH. Functional organization of posterior parietal cortex circuitry based on inferred information flow. Cell Rep 2024; 43:114028. [PMID: 38581681 PMCID: PMC11090617 DOI: 10.1016/j.celrep.2024.114028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 02/09/2024] [Accepted: 03/15/2024] [Indexed: 04/08/2024] Open
Abstract
Many studies infer the role of neurons by asking what information can be decoded from their activity or by observing the consequences of perturbing their activity. An alternative approach is to consider information flow between neurons. We applied this approach to the parietal reach region (PRR) and the lateral intraparietal area (LIP) in posterior parietal cortex. Two complementary methods imply that across a range of reaching tasks, information flows primarily from PRR to LIP. This indicates that during a coordinated reach task, LIP has minimal influence on PRR and rules out the idea that LIP forms a general purpose spatial processing hub for action and cognition. Instead, we conclude that PRR and LIP operate in parallel to plan arm and eye movements, respectively, with asymmetric interactions that likely support eye-hand coordination. Similar methods can be applied to other areas to infer their functional relationships based on inferred information flow.
Collapse
Affiliation(s)
- Jung Uk Kang
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO 63110, USA.
| | - Eric Mooshagian
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Lawrence H Snyder
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
6
|
Pereira-Obilinovic U, Hou H, Svoboda K, Wang XJ. Brain mechanism of foraging: Reward-dependent synaptic plasticity versus neural integration of values. Proc Natl Acad Sci U S A 2024; 121:e2318521121. [PMID: 38551832 PMCID: PMC10998608 DOI: 10.1073/pnas.2318521121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 01/16/2024] [Indexed: 04/02/2024] Open
Abstract
During foraging behavior, action values are persistently encoded in neural activity and updated depending on the history of choice outcomes. What is the neural mechanism for action value maintenance and updating? Here, we explore two contrasting network models: synaptic learning of action value versus neural integration. We show that both models can reproduce extant experimental data, but they yield distinct predictions about the underlying biological neural circuits. In particular, the neural integrator model but not the synaptic model requires that reward signals are mediated by neural pools selective for action alternatives and their projections are aligned with linear attractor axes in the valuation system. We demonstrate experimentally observable neural dynamical signatures and feasible perturbations to differentiate the two contrasting scenarios, suggesting that the synaptic model is a more robust candidate mechanism. Overall, this work provides a modeling framework to guide future experimental research on probabilistic foraging.
Collapse
Affiliation(s)
- Ulises Pereira-Obilinovic
- Center for Neural Science, New York University, New York, NY10003
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Han Hou
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Karel Svoboda
- Allen Institute for Neural Dynamics, Seattle, WA98109
| | - Xiao-Jing Wang
- Center for Neural Science, New York University, New York, NY10003
| |
Collapse
|
7
|
Mohebi A, Wei W, Pelattini L, Kim K, Berke JD. Dopamine transients follow a striatal gradient of reward time horizons. Nat Neurosci 2024; 27:737-746. [PMID: 38321294 PMCID: PMC11001583 DOI: 10.1038/s41593-023-01566-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 12/21/2023] [Indexed: 02/08/2024]
Abstract
Animals make predictions to guide their behavior and update those predictions through experience. Transient increases in dopamine (DA) are thought to be critical signals for updating predictions. However, it is unclear how this mechanism handles a wide range of behavioral timescales-from seconds or less (for example, if singing a song) to potentially hours or more (for example, if hunting for food). Here we report that DA transients in distinct rat striatal subregions convey prediction errors based on distinct time horizons. DA dynamics systematically accelerated from ventral to dorsomedial to dorsolateral striatum, in the tempo of spontaneous fluctuations, the temporal integration of prior rewards and the discounting of future rewards. This spectrum of timescales for evaluative computations can help achieve efficient learning and adaptive motivation for a broad range of behaviors.
Collapse
Affiliation(s)
- Ali Mohebi
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Wei Wei
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Lilian Pelattini
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Kyoungjun Kim
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Joshua D Berke
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California San Francisco, San Francisco, CA, USA.
- Neuroscience Graduate Program, University of California San Francisco, San Francisco, CA, USA.
- Kavli Institute for Fundamental Neuroscience, University of California San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
8
|
Lindeman S, Fu X, Reinert JK, Fukunaga I. Value-related learning in the olfactory bulb occurs through pathway-dependent perisomatic inhibition of mitral cells. PLoS Biol 2024; 22:e3002536. [PMID: 38427708 PMCID: PMC10936853 DOI: 10.1371/journal.pbio.3002536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 03/13/2024] [Accepted: 02/05/2024] [Indexed: 03/03/2024] Open
Abstract
Associating values to environmental cues is a critical aspect of learning from experiences, allowing animals to predict and maximise future rewards. Value-related signals in the brain were once considered a property of higher sensory regions, but their wide distribution across many brain regions is increasingly recognised. Here, we investigate how reward-related signals begin to be incorporated, mechanistically, at the earliest stage of olfactory processing, namely, in the olfactory bulb. In head-fixed mice performing Go/No-Go discrimination of closely related olfactory mixtures, rewarded odours evoke widespread inhibition in one class of output neurons, that is, in mitral cells but not tufted cells. The temporal characteristics of this reward-related inhibition suggest it is odour-driven, but it is also context-dependent since it is absent during pseudo-conditioning and pharmacological silencing of the piriform cortex. Further, the reward-related modulation is present in the somata but not in the apical dendritic tuft of mitral cells, suggesting an involvement of circuit components located deep in the olfactory bulb. Depth-resolved imaging from granule cell dendritic gemmules suggests that granule cells that target mitral cells receive a reward-related extrinsic drive. Thus, our study supports the notion that value-related modulation of olfactory signals is a characteristic of olfactory processing in the primary olfactory area and narrows down the possible underlying mechanisms to deeper circuit components that contact mitral cells perisomatically.
Collapse
Affiliation(s)
- Sander Lindeman
- Sensory and Behavioural Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Xiaochen Fu
- Sensory and Behavioural Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Janine Kristin Reinert
- Sensory and Behavioural Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Izumi Fukunaga
- Sensory and Behavioural Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
9
|
Prat-Carrabin A, Meyniel F, Azeredo da Silveira R. Resource-rational account of sequential effects in human prediction. eLife 2024; 13:e81256. [PMID: 38224341 PMCID: PMC10789490 DOI: 10.7554/elife.81256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 12/11/2023] [Indexed: 01/16/2024] Open
Abstract
An abundant literature reports on 'sequential effects' observed when humans make predictions on the basis of stochastic sequences of stimuli. Such sequential effects represent departures from an optimal, Bayesian process. A prominent explanation posits that humans are adapted to changing environments, and erroneously assume non-stationarity of the environment, even if the latter is static. As a result, their predictions fluctuate over time. We propose a different explanation in which sub-optimal and fluctuating predictions result from cognitive constraints (or costs), under which humans however behave rationally. We devise a framework of costly inference, in which we develop two classes of models that differ by the nature of the constraints at play: in one case the precision of beliefs comes at a cost, resulting in an exponential forgetting of past observations, while in the other beliefs with high predictive power are favored. To compare model predictions to human behavior, we carry out a prediction task that uses binary random stimuli, with probabilities ranging from 0.05 to 0.95. Although in this task the environment is static and the Bayesian belief converges, subjects' predictions fluctuate and are biased toward the recent stimulus history. Both classes of models capture this 'attractive effect', but they depart in their characterization of higher-order effects. Only the precision-cost model reproduces a 'repulsive effect', observed in the data, in which predictions are biased away from stimuli presented in more distant trials. Our experimental results reveal systematic modulations in sequential effects, which our theoretical approach accounts for in terms of rationality under cognitive constraints.
Collapse
Affiliation(s)
- Arthur Prat-Carrabin
- Department of Economics, Columbia UniversityNew YorkUnited States
- Laboratoire de Physique de l’École Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de ParisParisFrance
| | - Florent Meyniel
- Cognitive Neuroimaging Unit, Institut National de la Santé et de la Recherche Médicale, Commissariat à l’Energie Atomique et aux Energies Alternatives, Centre National de la Recherche Scientifique, Université Paris-Saclay, NeuroSpin centerGif-sur-YvetteFrance
- Institut de neuromodulation, GHU Paris, Psychiatrie et Neurosciences, Centre Hospitalier Sainte-Anne, Pôle Hospitalo-Universitaire 15, Université Paris CitéParisFrance
| | - Rava Azeredo da Silveira
- Laboratoire de Physique de l’École Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de ParisParisFrance
- Institute of Molecular and Clinical Ophthalmology BaselBaselSwitzerland
- Faculty of Science, University of BaselBaselSwitzerland
| |
Collapse
|
10
|
Panidi K, Vorobiova AN, Feurra M, Klucharev V. Posterior parietal cortex is causally involved in reward valuation but not in probability weighting during risky choice. Cereb Cortex 2024; 34:bhad446. [PMID: 38011084 DOI: 10.1093/cercor/bhad446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 11/02/2023] [Accepted: 11/04/2023] [Indexed: 11/29/2023] Open
Abstract
This study provides evidence that the posterior parietal cortex is causally involved in risky decision making via the processing of reward values but not reward probabilities. In the within-group experimental design, participants performed a binary lottery choice task following transcranial magnetic stimulation of the right posterior parietal cortex, left posterior parietal cortex, and a right posterior parietal cortex sham (placebo) stimulation. The continuous theta-burst stimulation protocol supposedly downregulating the cortical excitability was used. Both, mean-variance and the prospect theory approach to risky choice showed that the posterior parietal cortex stimulation shifted participants toward greater risk aversion compared with sham. On the behavioral level, after the posterior parietal cortex stimulation, the likelihood of choosing a safer option became more sensitive to the difference in standard deviations between lotteries, compared with sham, indicating greater risk avoidance within the mean-variance framework. We also estimated the shift in prospect theory parameters of risk preferences after posterior parietal cortex stimulation. The hierarchical Bayesian approach showed moderate evidence for a credible change in risk aversion parameter toward lower marginal reward value (and, hence, lower risk tolerance), while no credible change in probability weighting was observed. In addition, we observed anecdotal evidence for a credible increase in the consistency of responses after the left posterior parietal cortex stimulation compared with sham.
Collapse
Affiliation(s)
- Ksenia Panidi
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, ul. Myasnitskaya 20, Moscow 101000, Russian Federation
| | - Alicia N Vorobiova
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, ul. Myasnitskaya 20, Moscow 101000, Russian Federation
| | - Matteo Feurra
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, ul. Myasnitskaya 20, Moscow 101000, Russian Federation
| | - Vasily Klucharev
- Centre for Cognition and Decision Making, Institute for Cognitive Neuroscience, HSE University, ul. Myasnitskaya 20, Moscow 101000, Russian Federation
- Graduate School of Business, HSE University, ul. Shabolovka, 26, Moscow 119049, Russian Federation
| |
Collapse
|
11
|
Xu S, Ren W. Distinct processing of the state prediction error signals in frontal and parietal correlates in learning the environment model. Cereb Cortex 2024; 34:bhad449. [PMID: 38037370 DOI: 10.1093/cercor/bhad449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 10/31/2023] [Indexed: 12/02/2023] Open
Abstract
Goal-directed reinforcement learning constructs a model of how the states in the environment are connected and prospectively evaluates action values by simulating experience. State prediction error (SPE) is theorized as a crucial signal for learning the environment model. However, the underlying neural mechanisms remain unclear. Here, using electroencephalogram, we verified in a two-stage Markov task two neural correlates of SPEs: an early negative correlate transferring from frontal to central electrodes and a late positive correlate over parietal regions. Furthermore, by investigating the effects of explicit knowledge about the environment model and rewards in the environment, we found that, for the parietal correlate, rewards enhanced the representation efficiency (beta values of regression coefficient) of SPEs, whereas explicit knowledge elicited a larger SPE representation (event-related potential activity) for rare transitions. However, for the frontal and central correlates, rewards increased activities in a content-independent way and explicit knowledge enhanced activities only for common transitions. Our results suggest that the parietal correlate of SPEs is responsible for the explicit learning of state transition structure, whereas the frontal and central correlates may be involved in cognitive control. Our study provides novel evidence for distinct roles of the frontal and the parietal cortices in processing SPEs.
Collapse
Affiliation(s)
- Shuyuan Xu
- MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi'an, Shaanxi, China
| | - Wei Ren
- MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi'an, Shaanxi, China
- Faculty of Education, Shaanxi Normal University, Xi'an, Shaanxi, China
| |
Collapse
|
12
|
Valdebenito-Oyarzo G, Martínez-Molina MP, Soto-Icaza P, Zamorano F, Figueroa-Vargas A, Larraín-Valenzuela J, Stecher X, Salinas C, Bastin J, Valero-Cabré A, Polania R, Billeke P. The parietal cortex has a causal role in ambiguity computations in humans. PLoS Biol 2024; 22:e3002452. [PMID: 38198502 PMCID: PMC10824459 DOI: 10.1371/journal.pbio.3002452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 01/23/2024] [Accepted: 11/28/2023] [Indexed: 01/12/2024] Open
Abstract
Humans often face the challenge of making decisions between ambiguous options. The level of ambiguity in decision-making has been linked to activity in the parietal cortex, but its exact computational role remains elusive. To test the hypothesis that the parietal cortex plays a causal role in computing ambiguous probabilities, we conducted consecutive fMRI and TMS-EEG studies. We found that participants assigned unknown probabilities to objective probabilities, elevating the uncertainty of their decisions. Parietal cortex activity correlated with the objective degree of ambiguity and with a process that underestimates the uncertainty during decision-making. Conversely, the midcingulate cortex (MCC) encodes prediction errors and increases its connectivity with the parietal cortex during outcome processing. Disruption of the parietal activity increased the uncertainty evaluation of the options, decreasing cingulate cortex oscillations during outcome evaluation and lateral frontal oscillations related to value ambiguous probability. These results provide evidence for a causal role of the parietal cortex in computing uncertainty during ambiguous decisions made by humans.
Collapse
Affiliation(s)
- Gabriela Valdebenito-Oyarzo
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - María Paz Martínez-Molina
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Patricia Soto-Icaza
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Francisco Zamorano
- Unidad de Neuroimágenes Cuantitativas avanzadas (UNICA), Departamento de Imágenes, Clínica Alemana de Santiago, Santiago, Chile
- Facultad de Ciencias para el Cuidado de la Salud, Campus Los Leones, Universidad San Sebastián, Santiago, Chile
| | - Alejandra Figueroa-Vargas
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Josefina Larraín-Valenzuela
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Ximena Stecher
- Unidad de Neuroimágenes Cuantitativas avanzadas (UNICA), Departamento de Imágenes, Clínica Alemana de Santiago, Santiago, Chile
| | - César Salinas
- Unidad de Neuroimágenes Cuantitativas avanzadas (UNICA), Departamento de Imágenes, Clínica Alemana de Santiago, Santiago, Chile
| | - Julien Bastin
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, Grenoble, France
| | - Antoni Valero-Cabré
- Causal Dynamics, Plasticity and Rehabilitation Group, FRONTLAB team, Institut du Cerveau et de la Moelle Epinière (ICM), CNRS UMR 7225, INSERM U 1127 and Sorbonne Université, Paris, France
- Cognitive Neuroscience and Information Technology Research Program, Open University of Catalonia (UOC), Barcelona, Spain
- Laboratory for Cerebral Dynamics Plasticity and Rehabilitation, Boston University, School of Medicine, Boston, Massachusetts, United States of America
| | - Rafael Polania
- Decision Neuroscience Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
| | - Pablo Billeke
- Laboratorio de Neurociencia Social y Neuromodulación, Centro de Investigación en Complejidad Social, (neuroCICS), Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| |
Collapse
|
13
|
Brown LS, Cho JR, Bolkan SS, Nieh EH, Schottdorf M, Tank DW, Brody CD, Witten IB, Goldman MS. Neural circuit models for evidence accumulation through choice-selective sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.01.555612. [PMID: 38234715 PMCID: PMC10793437 DOI: 10.1101/2023.09.01.555612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Decision making is traditionally thought to be mediated by populations of neurons whose firing rates persistently accumulate evidence across time. However, recent decision-making experiments in rodents have observed neurons across the brain that fire sequentially as a function of spatial position or time, rather than persistently, with the subset of neurons in the sequence depending on the animal's choice. We develop two new candidate circuit models, in which evidence is encoded either in the relative firing rates of two competing chains of neurons or in the network location of a stereotyped pattern ("bump") of neural activity. Encoded evidence is then faithfully transferred between neuronal populations representing different positions or times. Neural recordings from four different brain regions during a decision-making task showed that, during the evidence accumulation period, different brain regions displayed tuning curves consistent with different candidate models for evidence accumulation. This work provides mechanistic models and potential neural substrates for how graded-value information may be precisely accumulated within and transferred between neural populations, a set of computations fundamental to many cognitive operations.
Collapse
|
14
|
Hattori R, Hedrick NG, Jain A, Chen S, You H, Hattori M, Choi JH, Lim BK, Yasuda R, Komiyama T. Meta-reinforcement learning via orbitofrontal cortex. Nat Neurosci 2023; 26:2182-2191. [PMID: 37957318 PMCID: PMC10689244 DOI: 10.1038/s41593-023-01485-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 10/06/2023] [Indexed: 11/15/2023]
Abstract
The meta-reinforcement learning (meta-RL) framework, which involves RL over multiple timescales, has been successful in training deep RL models that generalize to new environments. It has been hypothesized that the prefrontal cortex may mediate meta-RL in the brain, but the evidence is scarce. Here we show that the orbitofrontal cortex (OFC) mediates meta-RL. We trained mice and deep RL models on a probabilistic reversal learning task across sessions during which they improved their trial-by-trial RL policy through meta-learning. Ca2+/calmodulin-dependent protein kinase II-dependent synaptic plasticity in OFC was necessary for this meta-learning but not for the within-session trial-by-trial RL in experts. After meta-learning, OFC activity robustly encoded value signals, and OFC inactivation impaired the RL behaviors. Longitudinal tracking of OFC activity revealed that meta-learning gradually shapes population value coding to guide the ongoing behavioral policy. Our results indicate that two distinct RL algorithms with distinct neural mechanisms and timescales coexist in OFC to support adaptive decision-making.
Collapse
Affiliation(s)
- Ryoma Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA.
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA.
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA.
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
- Department of Neuroscience, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, University of Florida, Jupiter, FL, USA.
| | - Nathan G Hedrick
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Anant Jain
- Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA
| | - Shuqi Chen
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Hanjia You
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Mariko Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Jun-Hyeok Choi
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
| | - Byung Kook Lim
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
| | - Ryohei Yasuda
- Max Planck Florida Institute for Neuroscience, Jupiter, FL, USA
| | - Takaki Komiyama
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA.
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA.
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA.
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
15
|
Danskin BP, Hattori R, Zhang YE, Babic Z, Aoi M, Komiyama T. Exponential history integration with diverse temporal scales in retrosplenial cortex supports hyperbolic behavior. SCIENCE ADVANCES 2023; 9:eadj4897. [PMID: 38019904 PMCID: PMC10686558 DOI: 10.1126/sciadv.adj4897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 10/27/2023] [Indexed: 12/01/2023]
Abstract
Animals use past experience to guide future choices. The integration of experiences typically follows a hyperbolic, rather than exponential, decay pattern with a heavy tail for distant history. Hyperbolic integration affords sensitivity to both recent environmental dynamics and long-term trends. However, it is unknown how the brain implements hyperbolic integration. We found that mouse behavior in a foraging task showed hyperbolic decay of past experience, but the activity of cortical neurons showed exponential decay. We resolved this apparent mismatch by observing that cortical neurons encode history information with heterogeneous exponential time constants that vary across neurons. A model combining these diverse timescales recreated the heavy-tailed, hyperbolic history integration observed in behavior. In particular, the time constants of retrosplenial cortex (RSC) neurons best matched the behavior, and optogenetic inactivation of RSC uniquely reduced behavioral history dependence. These results indicate that behavior-relevant history information is maintained across multiple timescales in parallel and that RSC is a critical reservoir of information guiding decision-making.
Collapse
Affiliation(s)
- Bethanny P. Danskin
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Ryoma Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Yu E. Zhang
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Zeljana Babic
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Mikio Aoi
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Takaki Komiyama
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
16
|
Schamiloglu S, Wu H, Zhou M, Kwan AC, Bender KJ. Dynamic Foraging Behavior Performance Is Not Affected by Scn2a Haploinsufficiency. eNeuro 2023; 10:ENEURO.0367-23.2023. [PMID: 38151324 PMCID: PMC10755640 DOI: 10.1523/eneuro.0367-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/23/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023] Open
Abstract
Dysfunction in the gene SCN2A, which encodes the voltage-gated sodium channel Nav1.2, is strongly associated with neurodevelopmental disorders including autism spectrum disorder and intellectual disability (ASD/ID). This dysfunction typically manifests in these disorders as a haploinsufficiency, where loss of one copy of a gene cannot be compensated for by the other allele. Scn2a haploinsufficiency affects a range of cells and circuits across the brain, including associative neocortical circuits that are important for cognitive flexibility and decision-making behaviors. Here, we tested whether Scn2a haploinsufficiency has any effect on a dynamic foraging task that engages such circuits. Scn2a +/- mice and wild-type (WT) littermates were trained on a choice behavior where the probability of reward between two options varied dynamically across trials and where the location of the high reward underwent uncued reversals. Despite impairments in Scn2a-related neuronal excitability, we found that both male and female Scn2a +/- mice performed these tasks as well as wild-type littermates, with no behavioral difference across genotypes in learning or performance parameters. Varying the number of trials between reversals or probabilities of receiving reward did not result in an observable behavioral difference, either. These data suggest that, despite heterozygous loss of Scn2a, mice can perform relatively complex foraging tasks that make use of higher-order neuronal circuits.
Collapse
Affiliation(s)
- Selin Schamiloglu
- Neuroscience Graduate Program, University of California, San Francisco, CA 94158
- Center for Integrative Neuroscience, Department of Neurology, University of California, San Francisco, CA 94158
| | - Hao Wu
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT 06511
| | - Mingkang Zhou
- Neuroscience Graduate Program, University of California, San Francisco, CA 94158
- Center for Integrative Neuroscience, Department of Neurology, University of California, San Francisco, CA 94158
| | - Alex C Kwan
- Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT 06511
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY 14853
| | - Kevin J Bender
- Center for Integrative Neuroscience, Department of Neurology, University of California, San Francisco, CA 94158
| |
Collapse
|
17
|
Naamani G, Shahar N, Ger Y, Yovel Y. Fruit bats adjust their decision-making process according to environmental dynamics. BMC Biol 2023; 21:278. [PMID: 38031023 PMCID: PMC10687778 DOI: 10.1186/s12915-023-01774-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 11/21/2023] [Indexed: 12/01/2023] Open
Abstract
One of the main functions of behavioral plasticity lies in the ability to contend with dynamic environments. Indeed, while numerous studies have shown that animals adapt their behavior to the environment, how they adapt their latent learning and decision strategies to changes in the environment is less understood. Here, we used a controlled experiment to examine the bats' ability to adjust their decision strategy according to the environmental dynamics. Twenty-five Egyptian fruit bats were placed individually in either a stable or a volatile environment for four consecutive nights. In the stable environment, two feeders offered food, each with a different reward probability (0.2 vs. 0.8) that remained fixed over two nights and were then switched, while in the volatile environment, the positions of the more and the less rewarding feeders were changed every hour. We then fit two alternative commonly used models namely, reinforcement learning and win-stay-lose-shift strategies to the bats' behavior. We found that while the bats adapted their decision-making strategy to the environmental dynamics, they seemed to be limited in their responses based on natural priors. Namely, when the environment had changed slowly, at a rate that is natural for these bats, they seemed to rely on reinforcement learning and their performance was nearly optimal, but when the experimental environment changed much faster than in the natural environment, the bats stopped learning and switched to a random decision-making strategy. Together, these findings exemplify both the bats' decision-making plasticity as well as its natural limitations.
Collapse
Affiliation(s)
- Goni Naamani
- School of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, 6997801, Israel.
| | - Nitzan Shahar
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
- The School of Psychological Sciences, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Yoav Ger
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Yossi Yovel
- School of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, 6997801, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, 6997801, Israel
| |
Collapse
|
18
|
Shih WY, Yu HY, Lee CC, Chou CC, Chen C, Glimcher PW, Wu SW. Electrophysiological population dynamics reveal context dependencies during decision making in human frontal cortex. Nat Commun 2023; 14:7821. [PMID: 38016973 PMCID: PMC10684521 DOI: 10.1038/s41467-023-42092-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 09/28/2023] [Indexed: 11/30/2023] Open
Abstract
Evidence from monkeys and humans suggests that the orbitofrontal cortex (OFC) encodes the subjective value of options under consideration during choice. Data from non-human primates suggests that these value signals are context-dependent, representing subjective value in a way influenced by the decision makers' recent experience. Using electrodes distributed throughout cortical and subcortical structures, human epilepsy patients performed an auction task where they repeatedly reported the subjective values they placed on snack food items. High-gamma activity in many cortical and subcortical sites including the OFC positively correlated with subjective value. Other OFC sites showed signals contextually modulated by the subjective value of previously offered goods-a context dependency predicted by theory but not previously observed in humans. These results suggest that value and value-context signals are simultaneously present but separately represented in human frontal cortical activity.
Collapse
Affiliation(s)
- Wan-Yu Shih
- Institute of Neuroscience, College of Life Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.
| | - Hsiang-Yu Yu
- College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Cheng-Chia Lee
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Neurosurgery, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Chien-Chen Chou
- College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Chien Chen
- College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Paul W Glimcher
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience and Physiology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Shih-Wei Wu
- Institute of Neuroscience, College of Life Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.
| |
Collapse
|
19
|
Farries MA, Faust TW, Mohebi A, Berke JD. Selective encoding of reward predictions and prediction errors by globus pallidus subpopulations. Curr Biol 2023; 33:4124-4135.e5. [PMID: 37703876 PMCID: PMC10591972 DOI: 10.1016/j.cub.2023.08.042] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 07/04/2023] [Accepted: 08/15/2023] [Indexed: 09/15/2023]
Abstract
Basal ganglia (BG) circuits help guide and invigorate actions using predictions of future rewards (values). Within the BG, the globus pallidus pars externa (GPe) may play an essential role in aggregating and distributing value information. We recorded from the GPe in unrestrained rats performing both Pavlovian and instrumental tasks to obtain rewards and distinguished neuronal subtypes by their firing properties across the wake/sleep cycle and optogenetic tagging. In both tasks, the parvalbumin-positive (PV+), faster-firing "prototypical" neurons showed strong, sustained modulation by value, unlike other subtypes, including the "arkypallidal" cells that project back to striatum. Furthermore, we discovered that a distinct minority (7%) of GP cells display slower, pacemaker-like firing and encode reward prediction errors (RPEs) almost identically to midbrain dopamine neurons. These cell-specific forms of GPe value representation help define the circuit mechanisms by which the BG contribute to motivation and reinforcement learning.
Collapse
Affiliation(s)
- Michael A Farries
- Knoebel Institute for Healthy Aging, University of Denver, Denver, CO 80210, USA
| | - Thomas W Faust
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Ali Mohebi
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Joshua D Berke
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Psychiatry and Behavioral Sciences, Neuroscience Graduate Program, Kavli Institute for Fundamental Neuroscience, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
20
|
Rajagopalan AE, Darshan R, Hibbard KL, Fitzgerald JE, Turner GC. Reward expectations direct learning and drive operant matching in Drosophila. Proc Natl Acad Sci U S A 2023; 120:e2221415120. [PMID: 37733736 PMCID: PMC10523640 DOI: 10.1073/pnas.2221415120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/11/2023] [Indexed: 09/23/2023] Open
Abstract
Foraging animals must use decision-making strategies that dynamically adapt to the changing availability of rewards in the environment. A wide diversity of animals do this by distributing their choices in proportion to the rewards received from each option, Herrnstein's operant matching law. Theoretical work suggests an elegant mechanistic explanation for this ubiquitous behavior, as operant matching follows automatically from simple synaptic plasticity rules acting within behaviorally relevant neural circuits. However, no past work has mapped operant matching onto plasticity mechanisms in the brain, leaving the biological relevance of the theory unclear. Here, we discovered operant matching in Drosophila and showed that it requires synaptic plasticity that acts in the mushroom body and incorporates the expectation of reward. We began by developing a dynamic foraging paradigm to measure choices from individual flies as they learn to associate odor cues with probabilistic rewards. We then built a model of the fly mushroom body to explain each fly's sequential choice behavior using a family of biologically realistic synaptic plasticity rules. As predicted by past theoretical work, we found that synaptic plasticity rules could explain fly matching behavior by incorporating stimulus expectations, reward expectations, or both. However, by optogenetically bypassing the representation of reward expectation, we abolished matching behavior and showed that the plasticity rule must specifically incorporate reward expectations. Altogether, these results reveal the first synapse-level mechanisms of operant matching and provide compelling evidence for the role of reward expectation signals in the fly brain.
Collapse
Affiliation(s)
- Adithya E. Rajagopalan
- Janelia Research Campus, HHMI, Ashburn, VA20147
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD21205
| | - Ran Darshan
- Janelia Research Campus, HHMI, Ashburn, VA20147
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Sagol School of Neuroscience, The School of Physics and Astronomy, Tel Aviv University, Tel Aviv6997801, Israel
| | | | | | | |
Collapse
|
21
|
Le NM, Yildirim M, Wang Y, Sugihara H, Jazayeri M, Sur M. Mixtures of strategies underlie rodent behavior during reversal learning. PLoS Comput Biol 2023; 19:e1011430. [PMID: 37708113 PMCID: PMC10501641 DOI: 10.1371/journal.pcbi.1011430] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 08/09/2023] [Indexed: 09/16/2023] Open
Abstract
In reversal learning tasks, the behavior of humans and animals is often assumed to be uniform within single experimental sessions to facilitate data analysis and model fitting. However, behavior of agents can display substantial variability in single experimental sessions, as they execute different blocks of trials with different transition dynamics. Here, we observed that in a deterministic reversal learning task, mice display noisy and sub-optimal choice transitions even at the expert stages of learning. We investigated two sources of the sub-optimality in the behavior. First, we found that mice exhibit a high lapse rate during task execution, as they reverted to unrewarded directions after choice transitions. Second, we unexpectedly found that a majority of mice did not execute a uniform strategy, but rather mixed between several behavioral modes with different transition dynamics. We quantified the use of such mixtures with a state-space model, block Hidden Markov Model (block HMM), to dissociate the mixtures of dynamic choice transitions in individual blocks of trials. Additionally, we found that blockHMM transition modes in rodent behavior can be accounted for by two different types of behavioral algorithms, model-free or inference-based learning, that might be used to solve the task. Combining these approaches, we found that mice used a mixture of both exploratory, model-free strategies and deterministic, inference-based behavior in the task, explaining their overall noisy choice sequences. Together, our combined computational approach highlights intrinsic sources of noise in rodent reversal learning behavior and provides a richer description of behavior than conventional techniques, while uncovering the hidden states that underlie the block-by-block transitions.
Collapse
Affiliation(s)
- Nhat Minh Le
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Murat Yildirim
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Neurosciences, Cleveland Clinic Lerner Research Institute, Cleveland, Ohio, United States of America
| | - Yizhi Wang
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Hiroki Sugihara
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Mehrdad Jazayeri
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Mriganka Sur
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
22
|
Woo JH, Aguirre CG, Bari BA, Tsutsui KI, Grabenhorst F, Cohen JY, Schultz W, Izquierdo A, Soltani A. Mechanisms of adjustments to different types of uncertainty in the reward environment across mice and monkeys. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2023; 23:600-619. [PMID: 36823249 PMCID: PMC10444905 DOI: 10.3758/s13415-022-01059-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/22/2022] [Indexed: 02/25/2023]
Abstract
Despite being unpredictable and uncertain, reward environments often exhibit certain regularities, and animals navigating these environments try to detect and utilize such regularities to adapt their behavior. However, successful learning requires that animals also adjust to uncertainty associated with those regularities. Here, we analyzed choice data from two comparable dynamic foraging tasks in mice and monkeys to investigate mechanisms underlying adjustments to different types of uncertainty. In these tasks, animals selected between two choice options that delivered reward probabilistically, while baseline reward probabilities changed after a variable number (block) of trials without any cues to the animals. To measure adjustments in behavior, we applied multiple metrics based on information theory that quantify consistency in behavior, and fit choice data using reinforcement learning models. We found that in both species, learning and choice were affected by uncertainty about reward outcomes (in terms of determining the better option) and by expectation about when the environment may change. However, these effects were mediated through different mechanisms. First, more uncertainty about the better option resulted in slower learning and forgetting in mice, whereas it had no significant effect in monkeys. Second, expectation of block switches accompanied slower learning, faster forgetting, and increased stochasticity in choice in mice, whereas it only reduced learning rates in monkeys. Overall, while demonstrating the usefulness of metrics based on information theory in examining adaptive behavior, our study provides evidence for multiple types of adjustments in learning and choice behavior according to uncertainty in the reward environment.
Collapse
Affiliation(s)
- Jae Hyung Woo
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Claudia G Aguirre
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bilal A Bari
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Ken-Ichiro Tsutsui
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, UK
- Laboratory of Systems Neuroscience, Tohoku University Graduate School of Life Sciences, Sendai, Japan
| | - Fabian Grabenhorst
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, UK
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Jeremiah Y Cohen
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Allen Institute for Neural Dynamics, Seattle, WA, USA
| | - Wolfram Schultz
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, UK
| | - Alicia Izquierdo
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA
- The Brain Research Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA.
| |
Collapse
|
23
|
Stepniewska I, Kahler-Quesada S, Kaas JH, Friedman RM. Functional imaging and anatomical connections in squirrel monkeys reveal parietal-frontal circuits underlying eye movements. Cereb Cortex 2023; 33:7258-7275. [PMID: 36813296 PMCID: PMC10233296 DOI: 10.1093/cercor/bhad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 02/24/2023] Open
Abstract
The posterior parietal cortex (PPC) of squirrel monkeys contains subregions where long trains of intracortical microstimulation evoke complex, behaviorally meaningful movements. Recently, we showed that such stimulation of a part of the PPC in the caudal lateral sulcus (LS) elicits eye movements in these monkeys. Here, we studied the functional and anatomical connections of this oculomotor region we call parietal eye field (PEF) with frontal eye field (FEF) and other cortical regions in 2 squirrel monkeys. We demonstrated these connections with intrinsic optical imaging and injections of anatomical tracers. Optical imaging of frontal cortex during stimulation of the PEF evoked focal functional activation within FEF. Tracing studies confirmed the functional PEF-FEF connections. Moreover, tracer injections revealed PEF connections with other PPC regions on the dorsolateral and medial brain surface, cortex in the caudal LS, and visual and auditory cortical association areas. Subcortical projections of PEF were primarily with superior colliculus, and pontine nuclei as well as nuclei of the dorsal posterior thalamus and caudate. These findings suggest that PEF in squirrel monkey is homologous to lateral intraparietal (LIP) area of macaque, supporting the notion that these brain circuits are organized similarly to mediate ethologically relevant oculomotor behaviors.
Collapse
Affiliation(s)
- Iwona Stepniewska
- Department of Psychology, Vanderbilt University, Nashville, TN 37240, USA
| | - Sofia Kahler-Quesada
- Division of Neuroscience, Oregon National Primate Research Center, OHSU, Beaverton, OR 97006, USA
| | - Jon H Kaas
- Department of Psychology, Vanderbilt University, Nashville, TN 37240, USA
| | - Robert M Friedman
- Division of Neuroscience, Oregon National Primate Research Center, OHSU, Beaverton, OR 97006, USA
| |
Collapse
|
24
|
Klaes C, Pilacinski A, Kellis S, Aflalo T, Liu C, Andersen R. Neural representations of economic decision variables in human posterior parietal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541297. [PMID: 37293079 PMCID: PMC10245787 DOI: 10.1101/2023.05.18.541297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Decision making has been intensively studied in the posterior parietal cortex in non-human primates on a single neuron level. In humans decision making has mainly been studied with psychophysical tools or with fMRI. Here, we investigated how single neurons from human posterior parietal cortex represent numeric values informing future decisions during a complex two-player game. The tetraplegic study participant was implanted with a Utah electrode array in the anterior intraparietal area (AIP). We played a simplified variant of Black Jack with the participant while neuronal data was recorded. During the game two players are presented with numbers which are added up. Each time a number is presented the player has to decide to proceed or to stop. Once the first player stops or the score reaches a limit the turn passes on to the second player who tries to beat the score of the first player. Whoever is closer to the limit (without overshooting) wins the game. We found that many AIP neurons selectively responded to the face value of the presented number. Other neurons tracked the cumulative score or were selectively active for the upcoming decision of the study participant. Interestingly, some cells also kept track of the opponent's score. Our findings show that parietal regions engaged in hand action control also represent numbers and their complex transformations. This is also the first demonstration of complex economic decisions being possible to track in single neuron activity in human AIP. Our findings show how tight are the links between parietal neural circuits underlying hand control, numerical cognition and complex decision-making.
Collapse
|
25
|
Wang Z, Nan T, Goerlich KS, Li Y, Aleman A, Luo Y, Xu P. Neurocomputational mechanisms underlying fear-biased adaptation learning in changing environments. PLoS Biol 2023; 21:e3001724. [PMID: 37126501 PMCID: PMC10174591 DOI: 10.1371/journal.pbio.3001724] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 05/11/2023] [Accepted: 03/31/2023] [Indexed: 05/02/2023] Open
Abstract
Humans are able to adapt to the fast-changing world by estimating statistical regularities of the environment. Although fear can profoundly impact adaptive behaviors, the computational and neural mechanisms underlying this phenomenon remain elusive. Here, we conducted a behavioral experiment (n = 21) and a functional magnetic resonance imaging experiment (n = 37) with a novel cue-biased adaptation learning task, during which we simultaneously manipulated emotional valence (fearful/neutral expressions of the cue) and environmental volatility (frequent/infrequent reversals of reward probabilities). Across 2 experiments, computational modeling consistently revealed a higher learning rate for the environment with frequent versus infrequent reversals following neutral cues. In contrast, this flexible adjustment was absent in the environment with fearful cues, suggesting a suppressive role of fear in adaptation to environmental volatility. This suppressive effect was underpinned by activity of the ventral striatum, hippocampus, and dorsal anterior cingulate cortex (dACC) as well as increased functional connectivity between the dACC and temporal-parietal junction (TPJ) for fear with environmental volatility. Dynamic causal modeling identified that the driving effect was located in the TPJ and was associated with dACC activation, suggesting that the suppression of fear on adaptive behaviors occurs at the early stage of bottom-up processing. These findings provide a neuro-computational account of how fear interferes with adaptation to volatility during dynamic environments.
Collapse
Affiliation(s)
- Zhihao Wang
- Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (BNU), Faculty of Psychology, Beijing Normal University, Beijing, China
- CNRS-Centre d'Economie de la Sorbonne, Panthéon-Sorbonne University, France
| | - Tian Nan
- School of Psychology, Sichuan Center of Applied Psychology, Chengdu Medical College, Chengdu, China
| | - Katharina S Goerlich
- University of Groningen, Department of Biomedical Sciences of Cells & Systems, Section Cognitive Neuroscience, University Medical Center Groningen, Groningen, the Netherlands
| | - Yiman Li
- Shenzhen Key Laboratory of Affective and Social Neuroscience, Magnetic Resonance Imaging, Center for Brain Disorders and Cognitive Sciences, Shenzhen University, Shenzhen, China
| | - André Aleman
- University of Groningen, Department of Biomedical Sciences of Cells & Systems, Section Cognitive Neuroscience, University Medical Center Groningen, Groningen, the Netherlands
| | - Yuejia Luo
- School of Psychology, Sichuan Center of Applied Psychology, Chengdu Medical College, Chengdu, China
- Shenzhen Key Laboratory of Affective and Social Neuroscience, Magnetic Resonance Imaging, Center for Brain Disorders and Cognitive Sciences, Shenzhen University, Shenzhen, China
- The State Key Lab of Cognitive and Learning, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Pengfei Xu
- Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (BNU), Faculty of Psychology, Beijing Normal University, Beijing, China
- Center for Neuroimaging, Shenzhen Institute of Neuroscience, Shenzhen, China
| |
Collapse
|
26
|
Cazettes F, Mazzucato L, Murakami M, Morais JP, Augusto E, Renart A, Mainen ZF. A reservoir of foraging decision variables in the mouse brain. Nat Neurosci 2023; 26:840-849. [PMID: 37055628 PMCID: PMC10280691 DOI: 10.1038/s41593-023-01305-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 03/15/2023] [Indexed: 04/15/2023]
Abstract
In any given situation, the environment can be parsed in different ways to yield decision variables (DVs) defining strategies useful for different tasks. It is generally presumed that the brain only computes a single DV defining the current behavioral strategy. Here to test this assumption, we recorded neural ensembles in the frontal cortex of mice performing a foraging task admitting multiple DVs. Methods developed to uncover the currently employed DV revealed the use of multiple strategies and occasional switches in strategy within sessions. Optogenetic manipulations showed that the secondary motor cortex (M2) is needed for mice to use the different DVs in the task. Surprisingly, we found that regardless of which DV best explained the current behavior, M2 activity concurrently encoded a full basis set of computations defining a reservoir of DVs appropriate for alternative tasks. This form of neural multiplexing may confer considerable advantages for learning and adaptive behavior.
Collapse
Affiliation(s)
| | - Luca Mazzucato
- Departments of Biology, Mathematics & Physics, Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | - Masayoshi Murakami
- Champalimaud Foundation, Lisbon, Portugal
- Department of Neurophysiology, University of Yamanashi, Yamanashi, Japan
| | | | | | | | | |
Collapse
|
27
|
Smoulder AL, Marino PJ, Oby ER, Snyder SE, Miyata H, Pavlovsky NP, Bishop WE, Yu BM, Chase SM, Batista AP. A neural basis of choking under pressure. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.16.537007. [PMID: 37090659 PMCID: PMC10120738 DOI: 10.1101/2023.04.16.537007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Incentives tend to drive improvements in performance. But when incentives get too high, we can "choke under pressure" and underperform when it matters most. What neural processes might lead to choking under pressure? We studied Rhesus monkeys performing a challenging reaching task in which they underperform when an unusually large "jackpot" reward is at stake. We observed a collapse in neural information about upcoming movements for jackpot rewards: in the motor cortex, neural planning signals became less distinguishable for different reach directions when a jackpot reward was made available. We conclude that neural signals of reward and motor planning interact in the motor cortex in a manner that can explain why we choke under pressure. One-Sentence Summary In response to exceptionally large reward cues, animals can "choke under pressure", and this corresponds to a collapse in the neural information about upcoming movements.
Collapse
|
28
|
Xie T, Huang C, Zhang Y, Liu J, Yao H. Influence of Recent Trial History on Interval Timing. Neurosci Bull 2023; 39:559-575. [PMID: 36209314 PMCID: PMC10073370 DOI: 10.1007/s12264-022-00954-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Accepted: 07/10/2022] [Indexed: 11/30/2022] Open
Abstract
Interval timing is involved in a variety of cognitive behaviors such as associative learning and decision-making. While it has been shown that time estimation is adaptive to the temporal context, it remains unclear how interval timing behavior is influenced by recent trial history. Here we found that, in mice trained to perform a licking-based interval timing task, a decrease of inter-reinforcement interval in the previous trial rapidly shifted the time of anticipatory licking earlier. Optogenetic inactivation of the anterior lateral motor cortex (ALM), but not the medial prefrontal cortex, for a short time before reward delivery caused a decrease in the peak time of anticipatory licking in the next trial. Electrophysiological recordings from the ALM showed that the response profiles preceded by short and long inter-reinforcement intervals exhibited task-engagement-dependent temporal scaling. Thus, interval timing is adaptive to recent experience of the temporal interval, and ALM activity during time estimation reflects recent experience of interval.
Collapse
Affiliation(s)
- Taorong Xie
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Can Huang
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yijie Zhang
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jing Liu
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Haishan Yao
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China.
- Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, 201210, China.
| |
Collapse
|
29
|
Perisse E, Miranda M, Trouche S. Modulation of aversive value coding in the vertebrate and invertebrate brain. Curr Opin Neurobiol 2023; 79:102696. [PMID: 36871400 DOI: 10.1016/j.conb.2023.102696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 03/06/2023]
Abstract
Avoiding potentially dangerous situations is key for the survival of any organism. Throughout life, animals learn to avoid environments, stimuli or actions that can lead to bodily harm. While the neural bases for appetitive learning, evaluation and value-based decision-making have received much attention, recent studies have revealed more complex computations for aversive signals during learning and decision-making than previously thought. Furthermore, previous experience, internal state and systems level appetitive-aversive interactions seem crucial for learning specific aversive value signals and making appropriate choices. The emergence of novel methodologies (computation analysis coupled with large-scale neuronal recordings, neuronal manipulations at unprecedented resolution offered by genetics, viral strategies and connectomics) has helped to provide novel circuit-based models for aversive (and appetitive) valuation. In this review, we focus on recent vertebrate and invertebrate studies yielding strong evidence that aversive value information can be computed by a multitude of interacting brain regions, and that past experience can modulate future aversive learning and therefore influence value-based decisions.
Collapse
Affiliation(s)
- Emmanuel Perisse
- Institute of Functional Genomics, University of Montpellier, CNRS, Inserm, 141 rue de la Cardonille, 34094 Montpellier Cedex 5, France.
| | - Magdalena Miranda
- Institute of Functional Genomics, University of Montpellier, CNRS, Inserm, 141 rue de la Cardonille, 34094 Montpellier Cedex 5, France
| | - Stéphanie Trouche
- Institute of Functional Genomics, University of Montpellier, CNRS, Inserm, 141 rue de la Cardonille, 34094 Montpellier Cedex 5, France.
| |
Collapse
|
30
|
Ishino S, Kamada T, Sarpong GA, Kitano J, Tsukasa R, Mukohira H, Sun F, Li Y, Kobayashi K, Naoki H, Oishi N, Ogawa M. Dopamine error signal to actively cope with lack of expected reward. SCIENCE ADVANCES 2023; 9:eade5420. [PMID: 36897945 PMCID: PMC10005178 DOI: 10.1126/sciadv.ade5420] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/06/2023] [Indexed: 06/17/2023]
Abstract
To obtain more of a particular uncertain reward, animals must learn to actively overcome the lack of reward and adjust behavior to obtain it again. The neural mechanisms underlying such coping with reward omission remain unclear. Here, we developed a task in rats to monitor active behavioral switch toward the next reward after no reward. We found that some dopamine neurons in the ventral tegmental area exhibited increased responses to unexpected reward omission and decreased responses to unexpected reward, following the opposite responses of the well-known dopamine neurons that signal reward prediction error (RPE). The dopamine increase reflected in the nucleus accumbens correlated with behavioral adjustment to actively overcome unexpected no reward. We propose that these responses signal error to actively cope with lack of expected reward. The dopamine error signal thus cooperates with the RPE signal, enabling adaptive and robust pursuit of uncertain reward to ultimately obtain more reward.
Collapse
Affiliation(s)
- Seiya Ishino
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto 606-8501, Japan
- Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Aichi 444-8585, Japan
| | - Taisuke Kamada
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Gideon A. Sarpong
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Julia Kitano
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Reo Tsukasa
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Hisa Mukohira
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Fangmiao Sun
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing 100871, China
- PKU-IDG/McGovern Institute for Brain Research, Beijing 100871, China
| | - Yulong Li
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Beijing 100871, China
- PKU-IDG/McGovern Institute for Brain Research, Beijing 100871, China
| | - Kenta Kobayashi
- Section of Viral Vector Development, National Institute for Physiological Sciences, Okazaki, Aichi 444-8585, Japan
- SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Aichi 444-8585, Japan
| | - Honda Naoki
- Laboratory of Data-driven Biology, Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, Japan
- Theoretical Biology Research Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, Okazaki, Aichi 444-8787, Japan
- Laboratory of Theoretical Biology, Graduate School of Biostudies, Kyoto University, Kyoto 606-8315, Japan
- Kansei-Brain Informatics Group, Center for Brain, Mind and Kansei Sciences Research (BMK Center), Hiroshima University, Kasumi, Minami-ku, Hiroshima 734-8551, Japan
| | - Naoya Oishi
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Masaaki Ogawa
- Medical Innovation Center/SK Project, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto 606-8501, Japan
- Department of Developmental Physiology, National Institute for Physiological Sciences, Okazaki, Aichi 444-8585, Japan
| |
Collapse
|
31
|
Banerjee A, Wang BA, Teutsch J, Helmchen F, Pleger B. Analogous cognitive strategies for tactile learning in the rodent and human brain. Prog Neurobiol 2023; 222:102401. [PMID: 36608783 DOI: 10.1016/j.pneurobio.2023.102401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 12/21/2022] [Accepted: 01/02/2023] [Indexed: 01/05/2023]
Abstract
Evolution has molded individual species' sensory capacities and abilities. In rodents, who mostly inhabit dark tunnels and burrows, the whisker-based somatosensory system has developed as the dominant sensory modality, essential for environmental exploration and spatial navigation. In contrast, humans rely more on visual and auditory inputs when collecting information from their surrounding sensory space in everyday life. As a result of such species-specific differences in sensory dominance, cognitive relevance and capacities, the evidence for analogous sensory-cognitive mechanisms across species remains sparse. However, recent research in rodents and humans yielded surprisingly comparable processing rules for detecting tactile stimuli, integrating touch information into percepts, and goal-directed rule learning. Here, we review how the brain, across species, harnesses such processing rules to establish decision-making during tactile learning, following canonical circuits from the thalamus and the primary somatosensory cortex up to the frontal cortex. We discuss concordances between empirical and computational evidence from micro- and mesoscopic circuit studies in rodents to findings from macroscopic imaging in humans. Furthermore, we discuss the relevance and challenges for future cross-species research in addressing mutual context-dependent evaluation processes underpinning perceptual learning.
Collapse
Affiliation(s)
- Abhishek Banerjee
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom.
| | - Bin A Wang
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany.
| | - Jasper Teutsch
- Adaptive Decisions Lab, Biosciences Institute, Newcastle University, United Kingdom
| | - Fritjof Helmchen
- Laboratory of Neural Circuit Dynamics, Brain Research Institute, University of Zürich, Switzerland
| | - Burkhard Pleger
- Department of Neurology, BG University Hospital Bergmannsheil, Ruhr University Bochum, Germany; Collaborative Research Centre 874 "Integration and Representation of Sensory Processes", Ruhr University Bochum, Germany
| |
Collapse
|
32
|
Foster BL, Koslov SR, Aponik-Gremillion L, Monko ME, Hayden BY, Heilbronner SR. A tripartite view of the posterior cingulate cortex. Nat Rev Neurosci 2023; 24:173-189. [PMID: 36456807 PMCID: PMC10041987 DOI: 10.1038/s41583-022-00661-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2022] [Indexed: 12/03/2022]
Abstract
The posterior cingulate cortex (PCC) is one of the least understood regions of the cerebral cortex. By contrast, the anterior cingulate cortex has been the subject of intensive investigation in humans and model animal systems, leading to detailed behavioural and computational theoretical accounts of its function. The time is right for similar progress to be made in the PCC given its unique anatomical and physiological properties and demonstrably important contributions to higher cognitive functions and brain diseases. Here, we describe recent progress in understanding the PCC, with a focus on convergent findings across species and techniques that lay a foundation for establishing a formal theoretical account of its functions. Based on this converging evidence, we propose that the broader PCC region contains three major subregions - the dorsal PCC, ventral PCC and retrosplenial cortex - that respectively support the integration of executive, mnemonic and spatial processing systems. This tripartite subregional view reconciles inconsistencies in prior unitary theories of PCC function and offers promising new avenues for progress.
Collapse
Affiliation(s)
- Brett L Foster
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Seth R Koslov
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Lyndsey Aponik-Gremillion
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.,Department of Health Sciences, Dumke College for Health Professionals, Weber State University, Ogden, UT, USA
| | - Megan E Monko
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA
| | - Benjamin Y Hayden
- Department of Neuroscience, University of Minnesota, Minneapolis, MN, USA.,Center for Magnetic Resonance Research and Center for Neural Engineering, University of Minnesota, Minneapolis, MN, USA
| | | |
Collapse
|
33
|
Asahina T, Shimba K, Kotani K, Jimbo Y. Improving the accuracy of decoding monkey brain-machine interface data by estimating the state of unobserved cell assemblies. J Neurosci Methods 2023; 385:109764. [PMID: 36476748 DOI: 10.1016/j.jneumeth.2022.109764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 11/27/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND The brain-machine interface is a technology that has been used for improving the quality of life of individuals with physical disabilities and also healthy individuals. It is important to improve the methods used for decoding the brain-machine interface data as the accuracy and speed of movements achieved using the existing technology are not comparable to the normal body. COMPARISON WITH THE EXISTING METHOD Decoding of brain-machine interface data using the proposed method resulted in improved decoding accuracy compared to the existing method. CONCLUSIONS The results demonstrated the usefulness of cell assembly state estimation method for decoding the brain-machine interface data. NEW METHOD We incorporated a novel method of estimating cell assembly states using spike trains with the existing decoding method that used only firing rate data. Synaptic connectivity pattern was used as feature values in addition to firing rate. Publicly available monkey brain-machine interface datasets were used in the study. RESULTS As long as the decoding was successful, the root mean square error of the proposed method was significantly smaller than the existing method. Artificial neural netowork-based decoding method resulted in more stable decoding, and also improved the decoding accuracy due to incorporation of synaptic connectivity pattern.
Collapse
Affiliation(s)
- Takahiro Asahina
- School of Engineering, The University of Tokyo, Tokyo, Japan; Japan Society for the Promotion of Science, Japan.
| | - Kenta Shimba
- School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Kiyoshi Kotani
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Yasuhiko Jimbo
- School of Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
34
|
Undermatching Is a Consequence of Policy Compression. J Neurosci 2023; 43:447-457. [PMID: 36639891 PMCID: PMC9864556 DOI: 10.1523/jneurosci.1003-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
The matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch or bias choices toward the poorer option. Overmatching, or the tendency to bias choices toward the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which the policy of an agent is state dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity) can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, using mouse behavioral data (male), we validate a novel prediction about which task conditions exaggerate undermatching. Finally, in patients with Parkinson's disease (male and female), we argue that a reduction in undermatching with higher dopamine levels is consistent with an increased policy complexity.SIGNIFICANCE STATEMENT The matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option a yields twice as much reward as option b, matching states that agents will choose option a twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.
Collapse
|
35
|
A Reinforcement Meta-Learning framework of executive function and information demand. Neural Netw 2023; 157:103-113. [DOI: 10.1016/j.neunet.2022.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 09/05/2022] [Accepted: 10/06/2022] [Indexed: 11/09/2022]
|
36
|
Korbisch CC, Apuan DR, Shadmehr R, Ahmed AA. Saccade vigor reflects the rise of decision variables during deliberation. Curr Biol 2022; 32:5374-5381.e4. [PMID: 36413989 PMCID: PMC9795813 DOI: 10.1016/j.cub.2022.10.053] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 08/12/2022] [Accepted: 10/25/2022] [Indexed: 11/23/2022]
Abstract
During deliberation, as we quietly consider our options, the neural activities representing the decision variables that reflect the goodness of each option rise in various regions of the cerebral cortex.1,2,3,4,5,6,7 If the options are depicted visually, we make saccades, focusing gaze on each option. Do the kinematics of these saccades reflect the state of the decision variables? To test this idea, we engaged human participants in a decision-making task in which they considered two effortful options that required walking across various distances and inclines. As they deliberated, they made saccades between the symbolic representations of their options. These deliberation period saccades had no bearing on the effort they would later expend, yet saccade velocities increased gradually and differentially: the rate of rise was faster for saccades toward the option that they later indicated as their choice. Indeed, the rate of rise encoded the difference in the subjective value of the two options. Importantly, the participants did not reveal their choice at the conclusion of deliberation, but rather waited during a delay period, and finally expressed their choice by making another saccade. Remarkably, vigor for this saccade dropped to baseline and no longer encoded subjective value. Thus, saccade vigor appeared to provide a real-time window to the otherwise hidden process of option evaluation during deliberation.
Collapse
Affiliation(s)
- Colin C Korbisch
- Mechanical Engineering and Biomedical Engineering, Neuromechanics Laboratory, University of Colorado Boulder, Boulder, CO 80309, USA
| | - Daniel R Apuan
- Mechanical Engineering and Biomedical Engineering, Neuromechanics Laboratory, University of Colorado Boulder, Boulder, CO 80309, USA
| | - Reza Shadmehr
- Department of Biomedical Engineering, Laboratory for Computational Motor Control, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Alaa A Ahmed
- Mechanical Engineering and Biomedical Engineering, Neuromechanics Laboratory, University of Colorado Boulder, Boulder, CO 80309, USA.
| |
Collapse
|
37
|
Neuronal Response to Reward and Luminance in Macaque LIP During Saccadic Choice. Neurosci Bull 2022; 39:14-28. [PMID: 36114983 PMCID: PMC9849667 DOI: 10.1007/s12264-022-00948-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 06/18/2022] [Indexed: 01/22/2023] Open
Abstract
Recent work in decision neuroscience suggests that visual saliency can interact with reward-based choice, and the lateral intraparietal cortex (LIP) is implicated in this process. In this study, we recorded from LIP neurons while monkeys performed a two alternative choice task in which the reward and luminance associated with each offer were varied independently. We discovered that the animal's choice was dictated by the reward amount while the luminance had a marginal effect. In the LIP, neuronal activity corresponded well with the animal's choice pattern, in that a majority of reward-modulated neurons encoded the reward amount in the neuron's preferred hemifield with a positive slope. In contrast, compared to their responses to low luminance, an approximately equal proportion of luminance-sensitive neurons responded to high luminance with increased or decreased activity, leading to a much weaker population-level response. Meanwhile, in the non-preferred hemifield, the strength of encoding for reward amount and luminance was positively correlated, suggesting the integration of these two factors in the LIP. Moreover, neurons encoding reward and luminance were homogeneously distributed along the anterior-posterior axis of the LIP. Overall, our study provides further evidence supporting the neural instantiation of a priority map in the LIP in reward-based decisions.
Collapse
|
38
|
Sylwestrak EL, Jo Y, Vesuna S, Wang X, Holcomb B, Tien RH, Kim DK, Fenno L, Ramakrishnan C, Allen WE, Chen R, Shenoy KV, Sussillo D, Deisseroth K. Cell-type-specific population dynamics of diverse reward computations. Cell 2022; 185:3568-3587.e27. [PMID: 36113428 PMCID: PMC10387374 DOI: 10.1016/j.cell.2022.08.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 06/16/2022] [Accepted: 08/17/2022] [Indexed: 01/26/2023]
Abstract
Computational analysis of cellular activity has developed largely independently of modern transcriptomic cell typology, but integrating these approaches may be essential for full insight into cellular-level mechanisms underlying brain function and dysfunction. Applying this approach to the habenula (a structure with diverse, intermingled molecular, anatomical, and computational features), we identified encoding of reward-predictive cues and reward outcomes in distinct genetically defined neural populations, including TH+ cells and Tac1+ cells. Data from genetically targeted recordings were used to train an optimized nonlinear dynamical systems model and revealed activity dynamics consistent with a line attractor. High-density, cell-type-specific electrophysiological recordings and optogenetic perturbation provided supporting evidence for this model. Reverse-engineering predicted how Tac1+ cells might integrate reward history, which was complemented by in vivo experimentation. This integrated approach describes a process by which data-driven computational models of population activity can generate and frame actionable hypotheses for cell-type-specific investigation in biological systems.
Collapse
Affiliation(s)
- Emily L Sylwestrak
- Department of Biology, University of Oregon, Eugene, OR 97403, USA; Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Institute of Neuroscience, University of Oregon, Eugene, OR 97403, USA.
| | - YoungJu Jo
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| | - Sam Vesuna
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Xiao Wang
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Blake Holcomb
- Institute of Neuroscience, University of Oregon, Eugene, OR 97403, USA
| | - Rebecca H Tien
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Doo Kyung Kim
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Lief Fenno
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Charu Ramakrishnan
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - William E Allen
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Neurosciences Interdepartmental Program, Stanford University, Stanford, CA 94303, USA
| | - Ritchie Chen
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Krishna V Shenoy
- Department of Neurobiology, Stanford University, Stanford, CA 94303, USA; Department of Electrical Engineering, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA; Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA
| | - David Sussillo
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Karl Deisseroth
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA; Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA; Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
39
|
Karin O, Alon U. The dopamine circuit as a reward-taxis navigation system. PLoS Comput Biol 2022; 18:e1010340. [PMID: 35877694 PMCID: PMC9352198 DOI: 10.1371/journal.pcbi.1010340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 08/04/2022] [Accepted: 06/29/2022] [Indexed: 01/29/2023] Open
Abstract
Studying the brain circuits that control behavior is challenging, since in addition to their structural complexity there are continuous feedback interactions between actions and sensed inputs from the environment. It is therefore important to identify mathematical principles that can be used to develop testable hypotheses. In this study, we use ideas and concepts from systems biology to study the dopamine system, which controls learning, motivation, and movement. Using data from neuronal recordings in behavioral experiments, we developed a mathematical model for dopamine responses and the effect of dopamine on movement. We show that the dopamine system shares core functional analogies with bacterial chemotaxis. Just as chemotaxis robustly climbs chemical attractant gradients, the dopamine circuit performs ‘reward-taxis’ where the attractant is the expected value of reward. The reward-taxis mechanism provides a simple explanation for scale-invariant dopaminergic responses and for matching in free operant settings, and makes testable quantitative predictions. We propose that reward-taxis is a simple and robust navigation strategy that complements other, more goal-directed navigation mechanisms.
Collapse
Affiliation(s)
- Omer Karin
- Dept. of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Israel
- Dept. of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (OK); (UA)
| | - Uri Alon
- Dept. of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Israel
- * E-mail: (OK); (UA)
| |
Collapse
|
40
|
Suhaimi A, Lim AWH, Chia XW, Li C, Makino H. Representation learning in the artificial and biological neural networks underlying sensorimotor integration. SCIENCE ADVANCES 2022; 8:eabn0984. [PMID: 35658033 PMCID: PMC9166289 DOI: 10.1126/sciadv.abn0984] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 04/18/2022] [Indexed: 06/15/2023]
Abstract
The integration of deep learning and theories of reinforcement learning (RL) is a promising avenue to explore novel hypotheses on reward-based learning and decision-making in humans and other animals. Here, we trained deep RL agents and mice in the same sensorimotor task with high-dimensional state and action space and studied representation learning in their respective neural networks. Evaluation of thousands of neural network models with extensive hyperparameter search revealed that learning-dependent enrichment of state-value and policy representations of the task-performance-optimized deep RL agent closely resembled neural activity of the posterior parietal cortex (PPC). These representations were critical for the task performance in both systems. PPC neurons also exhibited representations of the internally defined subgoal, a feature of deep RL algorithms postulated to improve sample efficiency. Such striking resemblance between the artificial and biological networks and their functional convergence in sensorimotor integration offers new opportunities to better understand respective intelligent systems.
Collapse
|
41
|
Hogeveen J, Mullins TS, Romero JD, Eversole E, Rogge-Obando K, Mayer AR, Costa VD. The neurocomputational bases of explore-exploit decision-making. Neuron 2022; 110:1869-1879.e5. [PMID: 35390278 PMCID: PMC9167768 DOI: 10.1016/j.neuron.2022.03.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/11/2021] [Accepted: 03/10/2022] [Indexed: 02/04/2023]
Abstract
Flexible decision-making requires animals to forego immediate rewards (exploitation) and try novel choice options (exploration) to discover if they are preferable to familiar alternatives. Using the same task and a partially observable Markov decision process (POMDP) model to quantify the value of choices, we first determined that the computational basis for managing explore-exploit tradeoffs is conserved across monkeys and humans. We then used fMRI to identify where in the human brain the immediate value of exploitative choices and relative uncertainty about the value of exploratory choices were encoded. Consistent with prior neurophysiological evidence in monkeys, we observed divergent encoding of reward value and uncertainty in prefrontal and parietal regions, including frontopolar cortex, and parallel encoding of these computations in motivational regions including the amygdala, ventral striatum, and orbitofrontal cortex. These results clarify the interplay between prefrontal and motivational circuits that supports adaptive explore-exploit decisions in humans and nonhuman primates.
Collapse
Affiliation(s)
- Jeremy Hogeveen
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Teagan S Mullins
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - John D Romero
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - Elizabeth Eversole
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM 87131, USA
| | - Kimberly Rogge-Obando
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA
| | - Andrew R Mayer
- Department of Psychology, University of New Mexico, Albuquerque, NM 87131, USA; Department of Psychiatry & Behavioral Sciences, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA; Department of Neurology, University of New Mexico School of Medicine, Albuquerque, NM 87131, USA; The Mind Research Network/Lovelace Biomedical Research Institute, Pete & Nancy Domenici Hall, Albuquerque, NM 87106, USA
| | - Vincent D Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA; Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA.
| |
Collapse
|
42
|
Efficiently irrational: deciphering the riddle of human choice. Trends Cogn Sci 2022; 26:669-687. [PMID: 35643845 PMCID: PMC9283329 DOI: 10.1016/j.tics.2022.04.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/18/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022]
Abstract
For the past half-century, cognitive and social scientists have struggled with the irrationalities of human choice behavior; people consistently make choices that are logically inconsistent. Is human choice behavior evolutionarily adaptive or is it an inefficient patchwork of competing mechanisms? In this review, I present an interdisciplinary synthesis arguing for a novel interpretation: choice is efficiently irrational. Connecting findings across disciplines suggests that observed choice behavior reflects a precise optimization of the trade-off between the costs of increasing the precision of the choice mechanism and the declining benefits that come as precision increases. Under these constraints, a rationally imprecise strategy emerges that works toward optimal efficiency rather than toward optimal rationality. This approach rationalizes many of the puzzling inconsistencies of human choice behavior, explaining why these inconsistencies arise as an optimizing solution in biological choosers.
Collapse
|
43
|
Bloem B, Huda R, Amemori KI, Abate AS, Krishna G, Wilson AL, Carter CW, Sur M, Graybiel AM. Multiplexed action-outcome representation by striatal striosome-matrix compartments detected with a mouse cost-benefit foraging task. Nat Commun 2022; 13:1541. [PMID: 35318343 PMCID: PMC8941061 DOI: 10.1038/s41467-022-28983-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/15/2022] [Indexed: 11/17/2022] Open
Abstract
Learning about positive and negative outcomes of actions is crucial for survival and underpinned by conserved circuits including the striatum. How associations between actions and outcomes are formed is not fully understood, particularly when the outcomes have mixed positive and negative features. We developed a novel foraging (‘bandit’) task requiring mice to maximize rewards while minimizing punishments. By 2-photon Ca++ imaging, we monitored activity of visually identified anterodorsal striatal striosomal and matrix neurons. We found that action-outcome associations for reward and punishment were encoded in parallel in partially overlapping populations. Single neurons could, for one action, encode outcomes of opposing valence. Striosome compartments consistently exhibited stronger representations of reinforcement outcomes than matrix, especially for high reward or punishment prediction errors. These findings demonstrate multiplexing of action-outcome contingencies by single identified striatal neurons and suggest that striosomal neurons are particularly important in action-outcome learning. The role that the striatum plays in tracking the association between actions and combinations of rewarding and aversive outcomes remains unclear. Here, by using both calcium imaging in mice and reinforcement learning models, the authors find that individual striatal neurons can encode associations between actions and multiple, sometimes conflicting, outcomes.
Collapse
Affiliation(s)
- Bernard Bloem
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Sinopia Biosciences, 600W Broadway, Suite 700, San Diego, CA, 92101, USA
| | - Rafiq Huda
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Picower Institute for Learning and Memory, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Department of Cell Biology and Neuroscience, WM Keck Center for Collaborative Neuroscience, Rutgers University, 604 Allison Rd, Piscataway, NJ, 08854, USA
| | - Ken-Ichi Amemori
- Institute for the Advanced Study of Human Biology, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Alex S Abate
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA
| | - Gayathri Krishna
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA
| | - Anna L Wilson
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA
| | - Cody W Carter
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA
| | - Mriganka Sur
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.,Picower Institute for Learning and Memory, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA
| | - Ann M Graybiel
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA. .,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA.
| |
Collapse
|
44
|
Florence L, Lassi DLS, Kortas GT, Lima DR, de Azevedo-Marques Périco C, Andrade AG, Torales J, Ventriglio A, De Berardis D, De Aquino JP, Castaldelli-Maia JM. Brain Correlates of the Alcohol Use Disorder Pharmacotherapy Response: A Systematic Review of Neuroimaging Studies. Brain Sci 2022; 12:brainsci12030386. [PMID: 35326342 PMCID: PMC8946664 DOI: 10.3390/brainsci12030386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 03/04/2022] [Accepted: 03/05/2022] [Indexed: 01/17/2023] Open
Abstract
Background: Although Alcohol Use Disorder (AUD) is highly prevalent worldwide, treating this condition remains challenging. Further, potential treatments for AUD do not fully address alcohol-induced neuroadaptive changes. Understanding the effects of pharmacotherapies for AUD on the human brain may lead to tailored, more effective treatments, and improved individual clinical outcomes. Objectives: We systematically reviewed the literature for studies investigating pharmacotherapies for AUD that included neuroimaging-based treatment outcomes. We searched the PubMed, Scielo, and PsycINFO databases up to January 2021. Study eligibility criteria, participants, and interventions: Eligible studies included those investigating pharmacotherapies for AUD and employing functional magnetic resonance imaging (fMRI), positron emission tomography (PET), single-photon emission computed tomography (SPECT), and/or proton magnetic resonance spectroscopy (H-MRS). Study appraisal and synthesis methods: Two independent reviewers screened studies’ titles and abstracts for inclusion. Data extraction forms were shared among all the authors to standardize data collection. We gathered information on the following variables: sample size; mean age; sociodemographic and clinical characteristics; alcohol use status; study design and methodology; main neuroimaging findings and brain-regions of interest (i.e., brain areas activated by alcohol use and possible pharmacological interactions); and limitations of each study. Results: Out of 177 studies selected, 20 studies provided relevant data for the research topic. Findings indicate that: (1) Acamprosate and gabapentin may selectively modulate limbic regions and the anterior cingulate cortex; (2) Naltrexone and disulfiram effects may involve prefrontal, premotor, and cerebellar regions; (3) Pharmacotherapies acting on glutamate and GABA neurotransmission involve primarily areas underpinning reward and negative affective states, and; (4) Pharmacotherapies acting on opioid and dopamine systems may affect areas responsible for the cognitive and motor factors of AUD. Limitations: Most of the studies were focused on naltrexone. A small number of studies investigated the action of disulfiram and gabapentin, and no neuroimaging studies investigated topiramate. In addition, the time between medication and neuroimaging scans varied widely across studies. Conclusions: We identified key-brain regions modulated by treatments available for AUD. Some of the regions modulated by naltrexone are not specific to the brain reward system, such as the parahippocampal gyrus (temporal lobe), parietal and occipital lobes. Other treatments also modulate not specific regions of the reward system, but play a role in the addictive behaviors, including the insula and dorsolateral prefrontal cortex. The role of these brain regions in mediating the AUD pharmacotherapy response warrants investigation in future research studies.
Collapse
Affiliation(s)
- Luiza Florence
- Department of Neuroscience, FMABC University Center, Santo André 09060-870, SP, Brazil; (L.F.); (C.d.A.-M.P.); (A.G.A.)
| | - Dângela Layne Silva Lassi
- Department of Psychiatry, Medical School, University of São Paulo, São Paulo 05508-060, SP, Brazil; (D.L.S.L.); (G.T.K.); (D.R.L.)
| | - Guilherme T. Kortas
- Department of Psychiatry, Medical School, University of São Paulo, São Paulo 05508-060, SP, Brazil; (D.L.S.L.); (G.T.K.); (D.R.L.)
| | - Danielle R. Lima
- Department of Psychiatry, Medical School, University of São Paulo, São Paulo 05508-060, SP, Brazil; (D.L.S.L.); (G.T.K.); (D.R.L.)
| | | | - Arthur G. Andrade
- Department of Neuroscience, FMABC University Center, Santo André 09060-870, SP, Brazil; (L.F.); (C.d.A.-M.P.); (A.G.A.)
- Department of Psychiatry, Medical School, University of São Paulo, São Paulo 05508-060, SP, Brazil; (D.L.S.L.); (G.T.K.); (D.R.L.)
| | - Julio Torales
- Department of Psychiatry, National University of Asunción, San Lorenzo 2064, Paraguay;
| | - Antonio Ventriglio
- Department of Clinical and Experimental Medicine, University of Foggia, 71122 Foggia, Italy;
| | - Domenico De Berardis
- Mental Health Center of Giulianova, Asl Teramo, 64021 Giulianova, Italy;
- Department of Neurosciences and Imaging, University “G. D’Annunzio” Chieti, 66100 Chieti, Italy
- International Centre for Education and Research in Neuropsychiatry, University of Samara, 443100 Samara, Russia
| | - João P. De Aquino
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06510, USA;
| | - João M. Castaldelli-Maia
- Department of Neuroscience, FMABC University Center, Santo André 09060-870, SP, Brazil; (L.F.); (C.d.A.-M.P.); (A.G.A.)
- Department of Psychiatry, Medical School, University of São Paulo, São Paulo 05508-060, SP, Brazil; (D.L.S.L.); (G.T.K.); (D.R.L.)
- Correspondence:
| |
Collapse
|
45
|
Wong AL, Green AL, Isaacs MW. Motor Plans under Uncertainty Reflect a Trade-Off between Maximizing Reward and Success. eNeuro 2022; 9:ENEURO.0503-21.2022. [PMID: 35346958 PMCID: PMC9007409 DOI: 10.1523/eneuro.0503-21.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 03/10/2022] [Accepted: 03/15/2022] [Indexed: 11/21/2022] Open
Abstract
When faced with multiple potential movement options, individuals either reach directly to one of the options, or initiate a reach intermediate between the options. It remains unclear why people generate these two types of behaviors. Using the go-before-you-know task (commonly used to study behavior under choice uncertainty) in humans, we examined two key questions. First, do these two types of responses actually reflect distinct movement strategies? If so, the relative desirability (i.e., weighing the success likelihood vs the attainable reward) of the two target options would not need to be computed identically for direct and intermediate reaches. We showed that indeed, when reward and success likelihood differed between the two options, reach direction was preferentially biased toward different directions for direct versus intermediate reaches. Importantly, this suggests that the computation of subjective values depends on the choice of movement strategy. Second, what drives individual differences in how people respond under uncertainty? We found that risk/reward-seeking individuals tended to generate more intermediate reaches and were more responsive to changes in reward, suggesting these movements may reflect a strategy to maximize reward versus success. Together, these findings suggest that when faced with choice uncertainty, individuals adopt movement strategies consistent with their risk/reward attitude, preferentially biasing behavior toward exogenous rewards or endogenous success and consequently modulating the relative desirability of the available options.
Collapse
Affiliation(s)
- Aaron L Wong
- Moss Rehabilitation Research Institute, Elkins Park, PA 19027
| | - Audrey L Green
- Department of Neuroscience, Holy Family University, Philadelphia, PA 19114
| | | |
Collapse
|
46
|
Nguyen QN, Reinagel P. Different Forms of Variability Could Explain a Difference Between Human and Rat Decision Making. Front Neurosci 2022; 16:794681. [PMID: 35273473 PMCID: PMC8902138 DOI: 10.3389/fnins.2022.794681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 01/17/2022] [Indexed: 11/16/2022] Open
Abstract
When observers make rapid, difficult perceptual decisions, their response time is highly variable from trial to trial. In a visual motion discrimination task, it has been reported that human accuracy declines with increasing response time, whereas rat accuracy increases with response time. This is of interest because different mathematical theories of decision-making differ in their predictions regarding the correlation of accuracy with response time. On the premise that perceptual decision-making mechanisms are likely to be conserved among mammals, we seek to unify the rodent and primate results in a common theoretical framework. We show that a bounded drift diffusion model (DDM) can explain both effects with variable parameters: trial-to-trial variability in the starting point of the diffusion process produces the pattern typically observed in rats, whereas variability in the drift rate produces the pattern typically observed in humans. We further show that the same effects can be produced by deterministic biases, even in the absence of parameter stochasticity or parameter change within a trial.
Collapse
Affiliation(s)
| | - Pamela Reinagel
- Section of Neurobiology, Division of Biological Sciences, University of California, San Diego, San Diego, CA, United States
| |
Collapse
|
47
|
Grossman CD, Bari BA, Cohen JY. Serotonin neurons modulate learning rate through uncertainty. Curr Biol 2022; 32:586-599.e7. [PMID: 34936883 PMCID: PMC8825708 DOI: 10.1016/j.cub.2021.12.006] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 10/11/2021] [Accepted: 12/03/2021] [Indexed: 12/20/2022]
Abstract
Regulating how fast to learn is critical for flexible behavior. Learning about the consequences of actions should be slow in stable environments, but accelerate when that environment changes. Recognizing stability and detecting change are difficult in environments with noisy relationships between actions and outcomes. Under these conditions, theories propose that uncertainty can be used to modulate learning rates ("meta-learning"). We show that mice behaving in a dynamic foraging task exhibit choice behavior that varied as a function of two forms of uncertainty estimated from a meta-learning model. The activity of dorsal raphe serotonin neurons tracked both types of uncertainty in the foraging task as well as in a dynamic Pavlovian task. Reversible inhibition of serotonin neurons in the foraging task reproduced changes in learning predicted by a simulated lesion of meta-learning in the model. We thus provide a quantitative link between serotonin neuron activity, learning, and decision making.
Collapse
Affiliation(s)
- Cooper D Grossman
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, 725 N. Wolfe Street, Baltimore, MD 21205, USA
| | - Bilal A Bari
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, 725 N. Wolfe Street, Baltimore, MD 21205, USA
| | - Jeremiah Y Cohen
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, 725 N. Wolfe Street, Baltimore, MD 21205, USA.
| |
Collapse
|
48
|
Liu Z, Liu S, Li S, Li L, Zheng L, Weng X, Guo X, Lu Y, Men W, Gao J, You X. Dissociating Value-Based Neurocomputation from Subsequent Selection-Related Activations in Human Decision-Making. Cereb Cortex 2022; 32:4141-4155. [PMID: 35024797 DOI: 10.1093/cercor/bhab471] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 11/12/2022] Open
Abstract
Human decision-making requires the brain to fulfill neural computation of benefit and risk and therewith a selection between options. It remains unclear how value-based neural computation and subsequent brain activity evolve to achieve a final decision and which process is modulated by irrational factors. We adopted a sequential risk-taking task that asked participants to successively decide whether to open a box with potential reward/punishment in an eight-box trial, or not to open. With time-resolved multivariate pattern analyses, we decoded electroencephalography and magnetoencephalography responses to two successive low- and high-risk boxes before open-box action. Referencing the specificity of decoding-accuracy peak to a first-stage processing completion, we set it as the demarcation and dissociated the neural time course of decision-making into valuation and selection stages. The behavioral hierarchical drift diffusion modeling confirmed different information processing in two stages, that is, the valuation stage was related to the drift rate of evidence accumulation, while the selection stage was related to the nondecision time spent in response-producing. We further observed that medial orbitofrontal cortex participated in the valuation stage, while superior frontal gyrus engaged in the selection stage of irrational open-box decisions. Afterward, we revealed that irrational factors influenced decision-making through the selection stage rather than the valuation stage.
Collapse
Affiliation(s)
- Zhiyuan Liu
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, School of Psychology, Shaanxi Normal University, Xi'an 710062, China
| | - Sijia Liu
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou 310007, China
| | - Shuang Li
- School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Lin Li
- School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Li Zheng
- School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Xue Weng
- School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Xiuyan Guo
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou 310007, China.,Shanghai Key Laboratory of Magnetic Resonance, School of Physics and Materials Science, East China Normal University, Shanghai 200062, China
| | - Yang Lu
- School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China
| | - Weiwei Men
- Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100091, China.,Beijing City Key Laboratory for Medical Physics and Engineering, Institute of Heavy Ion Physics, School of Physics, Peking University, Beijing 100091, China
| | - Jiahong Gao
- Beijing City Key Laboratory for Medical Physics and Engineering, Institute of Heavy Ion Physics, School of Physics, Peking University, Beijing 100091, China.,Center for MRI Research and McGovern Institute for Brain Research, Peking University, Beijing 100091, China
| | - Xuqun You
- Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, School of Psychology, Shaanxi Normal University, Xi'an 710062, China
| |
Collapse
|
49
|
Averbeck B, O'Doherty JP. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 2022; 47:147-162. [PMID: 34354249 PMCID: PMC8616931 DOI: 10.1038/s41386-021-01108-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/06/2021] [Accepted: 07/09/2021] [Indexed: 01/03/2023]
Abstract
We review the current state of knowledge on the computational and neural mechanisms of reinforcement-learning with a particular focus on fronto-striatal circuits. We divide the literature in this area into five broad research themes: the target of the learning-whether it be learning about the value of stimuli or about the value of actions; the nature and complexity of the algorithm used to drive the learning and inference process; how learned values get converted into choices and associated actions; the nature of state representations, and of other cognitive machinery that support the implementation of various reinforcement-learning operations. An emerging fifth area focuses on how the brain allocates or arbitrates control over different reinforcement-learning sub-systems or "experts". We will outline what is known about the role of the prefrontal cortex and striatum in implementing each of these functions. We then conclude by arguing that it will be necessary to build bridges from algorithmic level descriptions of computational reinforcement-learning to implementational level models to better understand how reinforcement-learning emerges from multiple distributed neural networks in the brain.
Collapse
Affiliation(s)
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
50
|
Lyu N, Hu Y, Zhang J, Lloyd H, Sun YH, Tao Y. Switching costs in stochastic environments drive the emergence of matching behaviour in animal decision-making through the promotion of reward learning strategies. Sci Rep 2021; 11:23593. [PMID: 34880339 PMCID: PMC8654859 DOI: 10.1038/s41598-021-02979-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 11/23/2021] [Indexed: 11/18/2022] Open
Abstract
A principle of choice in animal decision-making named probability matching (PM) has long been detected in animals, and can arise from different decision-making strategies. Little is known about how environmental stochasticity may influence the switching time of these different decision-making strategies. Here we address this problem using a combination of behavioral and theoretical approaches, and show, that although a simple Win-Stay-Loss-Shift (WSLS) strategy can generate PM in binary-choice tasks theoretically, budgerigars (Melopsittacus undulates) actually apply a range of sub-tactics more often when they are expected to make more accurate decisions. Surprisingly, budgerigars did not get more rewards than would be predicted when adopting a WSLS strategy, and their decisions also exhibited PM. Instead, budgerigars followed a learning strategy based on reward history, which potentially benefits individuals indirectly from paying lower switching costs. Furthermore, our data suggest that more stochastic environments may promote reward learning through significantly less switching. We suggest that switching costs driven by the stochasticity of an environmental niche can potentially represent an important selection pressure associated with decision-making that may play a key role in driving the evolution of complex cognition in animals.
Collapse
Affiliation(s)
- Nan Lyu
- Ministry of Education Key Laboratory for Biodiversity and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China.
| | - Yunbiao Hu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Jiahua Zhang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Huw Lloyd
- Department of Natural Sciences, Faculty of Science and Engineering, Manchester Metropolitan University, Manchester, UK
| | - Yue-Hua Sun
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China.
| | - Yi Tao
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China.
| |
Collapse
|