1
|
Jin F, Li M, Yang L, Yang L, Shang Z. Exploring value learning in pigeons: the role of dual pathways in the basal ganglia and synaptic plasticity. J Exp Biol 2025; 228:jeb249507. [PMID: 40241515 DOI: 10.1242/jeb.249507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 04/11/2025] [Indexed: 04/18/2025]
Abstract
Understanding value learning in animals is a key focus in cognitive neuroscience. Current models used in research are often simple, and while more complex models have been proposed, it remains unclear which assumptions align with actual value-learning strategies of animals. This study investigated the computational mechanisms behind value learning in pigeons using a free-choice task. Three models were constructed based on different assumptions about the role of the basal ganglia's dual pathways and synaptic plasticity in value computation, followed by model comparison and neural correlation analysis. Among the three models tested, the dual-pathway reinforcement learning model with Hebbian rules most closely matched the pigeons' behavior. Furthermore, the striatal gamma band connectivity showed the highest correlation with the values estimated by this model. Additionally, enhanced beta band connectivity in the nidopallium caudolaterale supported value learning. This study provides valuable insights into reinforcement learning mechanisms in non-human animals.
Collapse
Affiliation(s)
- Fuli Jin
- Zhengzhou University, School of Electrical and Information Engineering, Zhengzhou 450001, China
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Mengmeng Li
- Zhengzhou University, School of Electrical and Information Engineering, Zhengzhou 450001, China
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Long Yang
- Zhengzhou University, School of Electrical and Information Engineering, Zhengzhou 450001, China
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Lifang Yang
- Zhengzhou University, School of Electrical and Information Engineering, Zhengzhou 450001, China
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| | - Zhigang Shang
- Zhengzhou University, School of Electrical and Information Engineering, Zhengzhou 450001, China
- Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou 450001, China
| |
Collapse
|
2
|
Gershman SJ, Assad JA, Datta SR, Linderman SW, Sabatini BL, Uchida N, Wilbrecht L. Explaining dopamine through prediction errors and beyond. Nat Neurosci 2024; 27:1645-1655. [PMID: 39054370 DOI: 10.1038/s41593-024-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 06/13/2024] [Indexed: 07/27/2024]
Abstract
The most influential account of phasic dopamine holds that it reports reward prediction errors (RPEs). The RPE-based interpretation of dopamine signaling is, in its original form, probably too simple and fails to explain all the properties of phasic dopamine observed in behaving animals. This Perspective helps to resolve some of the conflicting interpretations of dopamine that currently exist in the literature. We focus on the following three empirical challenges to the RPE theory of dopamine: why does dopamine (1) ramp up as animals approach rewards, (2) respond to sensory and motor features and (3) influence action selection? We argue that the prediction error concept, once it has been suitably modified and generalized based on an analysis of each computational problem, answers each challenge. Nonetheless, there are a number of additional empirical findings that appear to demand fundamentally different theoretical explanations beyond encoding RPE. Therefore, looking forward, we discuss the prospects for a unifying theory that respects the diversity of dopamine signaling and function as well as the complex circuitry that both underlies and responds to dopaminergic transmission.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| | - John A Assad
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | | | - Scott W Linderman
- Department of Statistics and Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Bernardo L Sabatini
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Linda Wilbrecht
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| |
Collapse
|
3
|
Beck DW, Heaton CN, Davila LD, Rakocevic LI, Drammis SM, Tyulmankov D, Vara P, Giri A, Umashankar Beck S, Zhang Q, Pokojovy M, Negishi K, Batson SA, Salcido AA, Reyes NF, Macias AY, Ibanez-Alcala RJ, Hossain SB, Waller GL, O'Dell LE, Moschak TM, Goosens KA, Friedman A. Model of a striatal circuit exploring biological mechanisms underlying decision-making during normal and disordered states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.29.605535. [PMID: 39211231 PMCID: PMC11361035 DOI: 10.1101/2024.07.29.605535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Decision-making requires continuous adaptation to internal and external contexts. Changes in decision-making are reliable transdiagnostic symptoms of neuropsychiatric disorders. We created a computational model demonstrating how the striosome compartment of the striatum constructs a mathematical space for decision-making computations depending on context, and how the matrix compartment defines action value depending on the space. The model explains multiple experimental results and unifies other theories like reward prediction error, roles of the direct versus indirect pathways, and roles of the striosome versus matrix, under one framework. We also found, through new analyses, that striosome and matrix neurons increase their synchrony during difficult tasks, caused by a necessary increase in dimensionality of the space. The model makes testable predictions about individual differences in disorder susceptibility, decision-making symptoms shared among neuropsychiatric disorders, and differences in neuropsychiatric disorder symptom presentation. The model reframes the role of the striosomal circuit in neuroeconomic and disorder-affected decision-making. Highlights Striosomes prioritize decision-related data used by matrix to set action values. Striosomes and matrix have different roles in the direct and indirect pathways. Abnormal information organization/valuation alters disorder presentation. Variance in data prioritization may explain individual differences in disorders. eTOC Beck et al. developed a computational model of how a striatal circuit functions during decision-making. The model unifies and extends theories about the direct versus indirect pathways. It further suggests how aberrant circuit function underlies decision-making phenomena observed in neuropsychiatric disorders.
Collapse
|
4
|
Wang Y, Lak A, Manohar SG, Bogacz R. Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration. PLoS Comput Biol 2024; 20:e1011516. [PMID: 38626219 PMCID: PMC11051659 DOI: 10.1371/journal.pcbi.1011516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/26/2024] [Accepted: 03/23/2024] [Indexed: 04/18/2024] Open
Abstract
When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.
Collapse
Affiliation(s)
- Yuhao Wang
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| | - Armin Lak
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Sanjay G. Manohar
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
5
|
Houston AI, Rosenström TH. A critical review of risk-sensitive foraging. Biol Rev Camb Philos Soc 2024; 99:478-495. [PMID: 37987237 DOI: 10.1111/brv.13031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 10/31/2023] [Accepted: 11/01/2023] [Indexed: 11/22/2023]
Abstract
Foraging is risk sensitive if choices depend on the variability of returns from the options as well as their mean return. Risk-sensitive foraging is important in behavioural ecology, psychology and neurophysiology. It has been explained both in terms of mechanisms and in terms of evolutionary advantage. We provide a critical review, evaluating both mechanistic and evolutionary accounts. Some derivations of risk sensitivity from mechanistic models based on psychophysics are not convincing because they depend on an inappropriate use of Jensen's inequality. Attempts have been made to link risk sensitivity to the ecology of a species, but again these are not convincing. The field of risk-sensitive foraging has provided a focus for theoretical and empirical work and has yielded important insights, but we lack a simple and empirically defendable general account of it in either mechanistic or evolutionary terms. However, empirical analysis of choice sequences under theoretically motivated experimental designs and environmental settings appears a promising avenue for mapping the scope and relative merits of existing theories. Simply put, the devil is in the sequence.
Collapse
Affiliation(s)
- Alasdair I Houston
- School of Biological Sciences, University of Bristol, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK
| | - Tom H Rosenström
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, PL 21 (Haartmaninkatu 3), 00014, Helsinki, Finland
| |
Collapse
|
6
|
Millidge B, Tang M, Osanlouy M, Harper NS, Bogacz R. Predictive coding networks for temporal prediction. PLoS Comput Biol 2024; 20:e1011183. [PMID: 38557984 PMCID: PMC11008833 DOI: 10.1371/journal.pcbi.1011183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 04/11/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024] Open
Abstract
One of the key problems the brain faces is inferring the state of the world from a sequence of dynamically changing stimuli, and it is not yet clear how the sensory system achieves this task. A well-established computational framework for describing perceptual processes in the brain is provided by the theory of predictive coding. Although the original proposals of predictive coding have discussed temporal prediction, later work developing this theory mostly focused on static stimuli, and key questions on neural implementation and computational properties of temporal predictive coding networks remain open. Here, we address these questions and present a formulation of the temporal predictive coding model that can be naturally implemented in recurrent networks, in which activity dynamics rely only on local inputs to the neurons, and learning only utilises local Hebbian plasticity. Additionally, we show that temporal predictive coding networks can approximate the performance of the Kalman filter in predicting behaviour of linear systems, and behave as a variant of a Kalman filter which does not track its own subjective posterior variance. Importantly, temporal predictive coding networks can achieve similar accuracy as the Kalman filter without performing complex mathematical operations, but just employing simple computations that can be implemented by biological networks. Moreover, when trained with natural dynamic inputs, we found that temporal predictive coding can produce Gabor-like, motion-sensitive receptive fields resembling those observed in real neurons in visual areas. In addition, we demonstrate how the model can be effectively generalized to nonlinear systems. Overall, models presented in this paper show how biologically plausible circuits can predict future stimuli and may guide research on understanding specific neural circuits in brain areas involved in temporal prediction.
Collapse
Affiliation(s)
- Beren Millidge
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| | - Mufeng Tang
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| | - Mahyar Osanlouy
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Nicol S. Harper
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Rafal Bogacz
- MRC Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
| |
Collapse
|