1
|
Korbisch CC, Ahmed AA. Reaching vigor tracks learned prediction error. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.24.645035. [PMID: 40196502 PMCID: PMC11974846 DOI: 10.1101/2025.03.24.645035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Movement vigor across multiple modalities increases with reward, suggesting that the neural circuits that represent value influence the control of movement. Dopaminergic neuron (DAN) activity in the basal ganglia has been suggested as the potential mediator of this response. If DAN activity is the bridge between value and vigor, then vigor should track canonical mediators of this activity, namely reward expectation and reward prediction error. Here we ask if a similar time-locked response is present in vigor of reaching movements. We explore this link by leveraging the known phasic dopaminergic response to stochastic rewards, where activity is modulated by both reward expectation at cue and the prediction error at feedback. We used probabilistic rewards to create a reaching task rich in reward expectation, reward prediction error, and learning. In one experiment, target reward probabilities were explicitly stated, and in the other, were left unknown and to be learned by the participants. We included two stochastic rewards (probabilities 33% and 66%) and two deterministic ones (probabilities 100% and 0%). Outgoing peak velocity in both experiments increased with increasing reward expectation. Furthermore, we observed a short-latency response in the vigor of the ongoing movement, that tracked reward prediction error: either invigorating or enervating velocity consistent with the sign and magnitude of the error. Reaching kinematics also revealed the value-update process in a trial-to-trial fashion, similar to the effect of prediction error signals typical in dopamine-mediated striatal phasic activity. Lastly, reach vigor increased with reward history over trials, mirroring the motivational effects often linked to fluctuating dopamine levels. Taken together, our results demonstrate and exquisite link between known short-latency reward signals and the invigoration of both discrete and ongoing movements.
Collapse
Affiliation(s)
- Colin C Korbisch
- Department of Mechanical Engineering, University of Colorado Boulder
| | - Alaa A Ahmed
- Department of Mechanical Engineering, University of Colorado Boulder
- Biomedical Engineering Program, University of Colorado Boulder
| |
Collapse
|
2
|
Romero-Sosa JL, Yeghikian A, Wikenheiser AM, Blair HT, Izquierdo A. Neural coding of choice and outcome are modulated by uncertainty in orbitofrontal but not secondary motor cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.11.05.622092. [PMID: 39574574 PMCID: PMC11580916 DOI: 10.1101/2024.11.05.622092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2025]
Abstract
Orbitofrontal cortex (OFC) and secondary motor cortex (M2) are both implicated in flexible reward learning but the conditions that differentially recruit these regions are not fully understood. We imaged calcium activity from single neurons in OFC or M2 during de novo learning of uncertain reward probability schedules. Predictions of choice were decoded from M2 neurons with high accuracy under all certainty conditions, but were more accurately decoded from OFC neurons under greater uncertainty. In M2, the proportion of outcome-selective neurons decreased with uncertainty whereas this proportion remained stable in OFC, due to an increased recruitment of reward-selective neurons across levels of uncertainty. Decoding accuracy of both choice and outcome were predicted by indices of flexible strategy like Win-Stay and Lose-Shift in OFC, but not M2. When schedules were experienced in increasing and then decreasing uncertainty, chemogenetic perturbation of M2 and OFC neurons resulted in opposing roles in certain and uncertain conditions, respectively. Our results indicate that M2 neurons are causally involved in learning of more certain conditions, whereas OFC neurons preferentially encode choices and outcomes that foster greater reliance on adaptive strategies under conditions of uncertainty. This reveals a novel functional heterogeneity within frontal cortex in support of flexible learning.
Collapse
|
3
|
Ben-Zion Z, Levy I. Representation of Anticipated Rewards and Punishments in the Human Brain. Annu Rev Psychol 2025; 76:197-226. [PMID: 39418537 PMCID: PMC11930275 DOI: 10.1146/annurev-psych-022324-042614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Subjective value is a core concept in neuroeconomics, serving as the basis for decision making. Despite the extensive literature on the neural encoding of subjective reward value in humans, the neural representation of punishment value remains relatively understudied. This review synthesizes current knowledge on the neural representation of reward value, including methodologies, involved brain regions, and the concept of a common currency representation of diverse reward types in decision-making and learning processes. We then critically examine existing research on the neural representation of punishment value, highlighting conceptual and methodological challenges in human studies and insights gained from animal research. Finally, we explore how individual differences in reward and punishment processing may be linked to various mental illnesses, with a focus on stress-related psychopathologies. This review advocates for the integration of both rewards and punishments within value-based decision-making and learning frameworks, leveraging insights from cross-species studies and utilizing ecological gamified paradigms to reflect real-life scenarios.
Collapse
Affiliation(s)
- Ziv Ben-Zion
- Department of Psychiatry, Yale School of Medicine, Yale University, New Haven, Connecticut, USA
- VA Connecticut Healthcare System, U.S. Department of Veterans Affairs, West Haven, Connecticut, USA
- Department of Comparative Medicine, Yale School of Medicine, Yale University, New Haven, Connecticut, USA;
- Clinical Neuroscience Division, National Center for PTSD, U.S. Department of Veterans Affairs, Orange, Connecticut, USA
| | - Ifat Levy
- Wu Tsai Institute, Yale University, New Haven, Connecticut, USA
- Department of Neuroscience, Yale School of Medicine, Yale University, New Haven, Connecticut, USA
- Department of Psychology, Yale University, New Haven, Connecticut, USA
- Department of Comparative Medicine, Yale School of Medicine, Yale University, New Haven, Connecticut, USA;
| |
Collapse
|
4
|
Woo JH, Costa VD, Taswell CA, Rothenhoefer KM, Averbeck BB, Soltani A. Contribution of amygdala to dynamic model arbitration under uncertainty. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612869. [PMID: 39314420 PMCID: PMC11419134 DOI: 10.1101/2024.09.13.612869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Intrinsic uncertainty in the reward environment requires the brain to run multiple models simultaneously to predict outcomes based on preceding cues or actions, commonly referred to as stimulus- and action-based learning. Ultimately, the brain also must adopt appropriate choice behavior using reliability of these models. Here, we combined multiple experimental and computational approaches to quantify concurrent learning in monkeys performing tasks with different levels of uncertainty about the model of the environment. By comparing behavior in control monkeys and monkeys with bilateral lesions to the amygdala or ventral striatum, we found evidence for dynamic, competitive interaction between stimulus-based and action-based learning, and for a distinct role of the amygdala. Specifically, we demonstrate that the amygdala adjusts the initial balance between the two learning systems, thereby altering the interaction between arbitration and learning that shapes the time course of both learning and choice behaviors. This novel role of the amygdala can account for existing contradictory observations and provides testable predictions for future studies into circuit-level mechanisms of flexible learning and choice under uncertainty.
Collapse
|
5
|
Webb J, Steffan P, Hayden BY, Lee D, Kemere C, McGinley M. Foraging Under Uncertainty Follows the Marginal Value Theorem with Bayesian Updating of Environment Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.30.587253. [PMID: 38585964 PMCID: PMC10996644 DOI: 10.1101/2024.03.30.587253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Foraging theory has been a remarkably successful approach to understanding the behavior of animals in many contexts. In patch-based foraging contexts, the marginal value theorem (MVT) shows that the optimal strategy is to leave a patch when the marginal rate of return declines to the average for the environment. However, the MVT is only valid in deterministic environments whose statistics are known to the forager; naturalistic environments seldom meet these strict requirements. As a result, the strategies used by foragers in naturalistic environments must be empirically investigated. We developed a novel behavioral task and a corresponding computational framework for studying patch-leaving decisions in head-fixed and freely moving mice. We varied between-patch travel time, as well as within-patch reward depletion rate, both deterministically and stochastically. We found that mice adopt patch residence times in a manner consistent with the MVT and not explainable by simple ethologically motivated heuristic strategies. Critically, behavior was best accounted for by a modified form of the MVT wherein environment representations were updated based on local variations in reward timing, captured by a Bayesian estimator and dynamic prior. Thus, we show that mice can strategically attend to, learn from, and exploit task structure on multiple timescales simultaneously, thereby efficiently foraging in volatile environments. The results provide a foundation for applying the systems neuroscience toolkit in freely moving and head-fixed mice to understand the neural basis of foraging under uncertainty.
Collapse
Affiliation(s)
- James Webb
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, USA
| | - Paul Steffan
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Benjamin Y. Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Daeyeol Lee
- The Zanvyl Krieger Mind/Brain Institute, The Solomon H Snyder Department of Neuroscience, Department of Psychological and Brain Sciences, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
| | - Caleb Kemere
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Matthew McGinley
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, TX, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| |
Collapse
|