1
|
Morita K, Shimomura K, Kawaguchi Y. Opponent Learning with Different Representations in the Cortico-Basal Ganglia Circuits. eNeuro 2023; 10:ENEURO.0422-22.2023. [PMID: 36653187 PMCID: PMC9884109 DOI: 10.1523/eneuro.0422-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/06/2022] [Accepted: 01/03/2023] [Indexed: 01/20/2023] Open
Abstract
The direct and indirect pathways of the basal ganglia (BG) have been suggested to learn mainly from positive and negative feedbacks, respectively. Since these pathways unevenly receive inputs from different cortical neuron types and/or regions, they may preferentially use different state/action representations. We explored whether such a combined use of different representations, coupled with different learning rates from positive and negative reward prediction errors (RPEs), has computational benefits. We modeled animal as an agent equipped with two learning systems, each of which adopted individual representation (IR) or successor representation (SR) of states. With varying the combination of IR or SR and also the learning rates from positive and negative RPEs in each system, we examined how the agent performed in a dynamic reward navigation task. We found that combination of SR-based system learning mainly from positive RPEs and IR-based system learning mainly from negative RPEs could achieve a good performance in the task, as compared with other combinations. In such a combination of appetitive SR-based and aversive IR-based systems, both systems show activities of comparable magnitudes with opposite signs, consistent with the suggested profiles of the two BG pathways. Moreover, the architecture of such a combination provides a novel coherent explanation for the functional significance and underlying mechanism of diverse findings about the cortico-BG circuits. These results suggest that particularly combining different representations with appetitive and aversive learning could be an effective learning strategy in certain dynamic environments, and it might actually be implemented in the cortico-BG circuits.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo 113-0033, Japan
| | - Kanji Shimomura
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan
- Department of Behavioral Medicine, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira 187-8551, Japan
| | - Yasuo Kawaguchi
- Brain Science Institute, Tamagawa University, Machida 194-8610, Japan
- National Institute for Physiological Sciences (NIPS), Okazaki 444-8787, Japan
| |
Collapse
|
2
|
Hirschberg S, Dvorzhak A, Rasooli-Nejad SMA, Angelov S, Kirchner M, Mertins P, Lättig-Tünnemann G, Harms C, Schmitz D, Grantyn R. Uncoupling the Excitatory Amino Acid Transporter 2 From Its C-Terminal Interactome Restores Synaptic Glutamate Clearance at Corticostriatal Synapses and Alleviates Mutant Huntingtin-Induced Hypokinesia. Front Cell Neurosci 2022; 15:792652. [PMID: 35173582 PMCID: PMC8841566 DOI: 10.3389/fncel.2021.792652] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 12/21/2021] [Indexed: 02/05/2023] Open
Abstract
Rapid removal of glutamate from the sites of glutamate release is an essential step in excitatory synaptic transmission. However, despite many years of research, the molecular mechanisms underlying the intracellular regulation of glutamate transport at tripartite synapses have not been fully uncovered. This limits the options for pharmacological treatment of glutamate-related motor disorders, including Huntington’s disease (HD). We therefore investigated the possible binding partners of transgenic EAAT2 and their alterations under the influence of mutant huntingtin (mHTT). Mass spectrometry analysis after pull-down of striatal YFP-EAAT2 from wild-type (WT) mice and heterozygote (HET) Q175 mHTT-knock-in mice identified a total of 148 significant (FDR < 0.05) binders to full-length EAAT2. Of them 58 proteins exhibited mHTT-related differences. Most important, in 26 of the 58 mHTT-sensitive cases, protein abundance changed back toward WT levels when the mice expressed a C-terminal-truncated instead of full-length variant of EAAT2. These findings motivated new attempts to clarify the role of astrocytic EAAT2 regulation in cortico-basal movement control. Striatal astrocytes of Q175 HET mice were targeted by a PHP.B vector encoding EAAT2 with different degree of C-terminal modification, i.e., EAAT2-S506X (truncation at S506), EAAT2-4KR (4 lysine to arginine substitutions) or EAAT2 (full-length). The results were compared to HET and WT injected with a tag-only vector (CTRL). It was found that the presence of a C-terminal-modified EAAT2 transgene (i) increased the level of native EAAT2 protein in striatal lysates and perisynaptic astrocyte processes, (ii) enhanced the glutamate uptake of transduced astrocytes, (iii) stimulated glutamate clearance at individual corticostriatal synapses, (iv) increased the glutamate uptake of striatal astrocytes and (iv) alleviated the mHTT-related hypokinesia (open field indicators of movement initiation). In contrast, over-expression of full-length EAAT2 neither facilitated glutamate uptake nor locomotion. Together, our results support the new hypothesis that preventing abnormal protein-protein interactions at the C-terminal of EAAT2 could eliminate the mHTT-related deficits in corticostriatal synaptic glutamate clearance and movement initiation.
Collapse
Affiliation(s)
- Stefan Hirschberg
- Synaptic Dysfunction Lab, Neuroscience Research Center, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Anton Dvorzhak
- Synaptic Dysfunction Lab, Neuroscience Research Center, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Seyed M. A. Rasooli-Nejad
- Synaptic Dysfunction Lab, Neuroscience Research Center, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Svilen Angelov
- Synaptic Dysfunction Lab, Neuroscience Research Center, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Marieluise Kirchner
- Proteomics Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Philipp Mertins
- Proteomics Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Gilla Lättig-Tünnemann
- Department of Experimental Neurology, Charité – Universitätsmedizin Berlin, Berlin, Germany
- Center for Stroke Research Berlin, Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Christoph Harms
- Department of Experimental Neurology, Charité – Universitätsmedizin Berlin, Berlin, Germany
- Center for Stroke Research Berlin, Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Dietmar Schmitz
- German Center for Neurodegenerative Diseases (DZNE), Berlin, Germany
- Cluster of Excellence NeuroCure, Berlin, Germany
- Einstein Center for Neurosciences Berlin, Berlin, Germany
| | - Rosemarie Grantyn
- Synaptic Dysfunction Lab, Neuroscience Research Center, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
- Department of Experimental Neurology, Charité – Universitätsmedizin Berlin, Berlin, Germany
- Cluster of Excellence NeuroCure, Berlin, Germany
- Einstein Center for Neurosciences Berlin, Berlin, Germany
- *Correspondence: Rosemarie Grantyn,
| |
Collapse
|
3
|
Feng Z, Nagase AM, Morita K. A Reinforcement Learning Approach to Understanding Procrastination: Does Inaccurate Value Approximation Cause Irrational Postponing of a Task? Front Neurosci 2021; 15:660595. [PMID: 34602962 PMCID: PMC8481628 DOI: 10.3389/fnins.2021.660595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 08/16/2021] [Indexed: 11/27/2022] Open
Abstract
Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a "student" doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the "student" had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The "student" learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the "student" made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the "student," to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.
Collapse
Affiliation(s)
- Zheyu Feng
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Asako Mitsuto Nagase
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- Division of Neurology, Department of Brain and Neurosciences, Faculty of Medicine, Tottori University, Yonago, Japan
- Research Fellowship for Young Scientists, Japan Society for the Promotion of Science, Tokyo, Japan
- Department of Neurology, Faculty of Medicine, Shimane University, Izumo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
4
|
Bonnavion P, Fernández EP, Varin C, de Kerchove d’Exaerde A. It takes two to tango: Dorsal direct and indirect pathways orchestration of motor learning and behavioral flexibility. Neurochem Int 2019; 124:200-214. [DOI: 10.1016/j.neuint.2019.01.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Revised: 12/12/2018] [Accepted: 01/08/2019] [Indexed: 12/27/2022]
|
5
|
Morita K, Kawaguchi Y. A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine. Front Neural Circuits 2019; 12:111. [PMID: 30687019 PMCID: PMC6338031 DOI: 10.3389/fncir.2018.00111] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 11/29/2018] [Indexed: 01/07/2023] Open
Abstract
The hypothesis that the basal-ganglia direct and indirect pathways represent goodness (or benefit) and badness (or cost) of options, respectively, explains a wide range of phenomena. However, this hypothesis, named the Opponent Actor Learning (OpAL), still has limitations. Structurally, the OpAL model does not incorporate differentiation of the two types of cortical inputs to the basal-ganglia pathways received from intratelencephalic (IT) and pyramidal-tract (PT) neurons. Functionally, the OpAL model does not describe the temporal-difference (TD)-type reward-prediction-error (RPE), nor explains how RPE is calculated in the circuitry connecting to the DA neurons. In fact, there is a different hypothesis on the basal-ganglia pathways and DA, named the Cortico-Striatal-Temporal-Difference (CS-TD) model. The CS-TD model differentiates the IT and PT inputs, describes the TD-type RPE, and explains how TD-RPE is calculated. However, a critical difficulty in this model lies in its assumption that DA induces the same direction of plasticity in both direct and indirect pathways, which apparently contradicts the experimentally observed opposite effects of DA on these pathways. Here, we propose a new hypothesis that integrates the OpAL and CS-TD models. Specifically, we propose that the IT-basal-ganglia pathways represent goodness/badness of current options while the PT-indirect pathway represents the overall value of the previously chosen option, and both of these have influence on the DA neurons, through the basal-ganglia output, so that a variant of TD-RPE is calculated. A key assumption is that opposite directions of plasticity are induced upon phasic activation of DA neurons in the IT-indirect pathway and PT-indirect pathway because of different profiles of IT and PT inputs. Specifically, at PT→indirect-pathway-medium-spiny-neuron (iMSN) synapses, sustained glutamatergic inputs generate rich adenosine, which allosterically prevents DA-D2 receptor signaling and instead favors adenosine-A2A receptor signaling. Then, phasic DA-induced phasic adenosine, which reflects TD-RPE, causes long-term synaptic potentiation. In contrast, at IT→iMSN synapses where adenosine is scarce, phasic DA causes long-term synaptic depression via D2 receptor signaling. This new Opponency and Temporal-Difference (OTD) model provides unique predictions, part of which is potentially in line with recently reported activity patterns of neurons in the globus pallidus externus on the indirect pathway.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan.,International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study, Tokyo, Japan
| | - Yasuo Kawaguchi
- Division of Cerebral Circuitry, National Institute for Physiological Sciences, Okazaki, Japan.,Department of Physiological Sciences, Graduate University for Advanced Studies, Okazaki, Japan
| |
Collapse
|
6
|
Kawaguchi Y. Pyramidal Cell Subtypes and Their Synaptic Connections in Layer 5 of Rat Frontal Cortex. Cereb Cortex 2018; 27:5755-5771. [PMID: 29028949 DOI: 10.1093/cercor/bhx252] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 09/06/2017] [Indexed: 12/31/2022] Open
Abstract
The frontal cortical areas make a coordinated response that generates appropriate behavior commands, using individual local circuits with corticostriatal and corticocortical connections in longer time scales than sensory areas. In secondary motor cortex (M2), situated between the prefrontal and primary motor areas, major subtypes of layer 5 corticostriatal cells are crossed-corticostriatal (CCS) cells innervating both sides of striatum, and corticopontine (CPn) cells projecting to the ipsilateral striatum and pontine nuclei. CCS cells innervate CPn cells unidirectionally: the former are therefore hierarchically higher than the latter among L5 corticostriatal cells. CCS cells project directly to both frontal and nonfrontal areas. On the other hand, CPn cells innervate the thalamus and layer 1a of frontal areas, where thalamic fibers relaying basal ganglia outputs are distributed. Thus, CCS cells can make activities of frontal areas in concert with those of nonfrontal area using corticocortical loops, whereas CPn cells are more involved in closed corticostriatal loops than CCS cells. Since reciprocal connections between CPn cells with facilitatory synapses may be related to persistent activity, CPn cells play a key role of longer time constant processes in corticostriatal as well as in corticocortical loops between the frontal areas.
Collapse
Affiliation(s)
- Yasuo Kawaguchi
- Division of Cerebral Circuitry, National Institute for Physiological Sciences, Okazaki 444-8787, Japan.,Department of Physiological Sciences, SOKENDAI (Graduate University for Advanced Studies), Okazaki, Japan
| |
Collapse
|
7
|
A Neural Circuit Mechanism for the Involvements of Dopamine in Effort-Related Choices: Decay of Learned Values, Secondary Effects of Depletion, and Calculation of Temporal Difference Error. eNeuro 2018; 5:eN-NWR-0021-18. [PMID: 29468191 PMCID: PMC5820541 DOI: 10.1523/eneuro.0021-18.2018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/11/2018] [Indexed: 12/17/2022] Open
Abstract
Dopamine has been suggested to be crucially involved in effort-related choices. Key findings are that dopamine depletion (i) changed preference for a high-cost, large-reward option to a low-cost, small-reward option, (ii) but not when the large-reward option was also low-cost or the small-reward option gave no reward, (iii) while increasing the latency in all the cases but only transiently, and (iv) that antagonism of either dopamine D1 or D2 receptors also specifically impaired selection of the high-cost, large-reward option. The underlying neural circuit mechanisms remain unclear. Here we show that findings i–iii can be explained by the dopaminergic representation of temporal-difference reward-prediction error (TD-RPE), whose mechanisms have now become clarified, if (1) the synaptic strengths storing the values of actions mildly decay in time and (2) the obtained-reward-representing excitatory input to dopamine neurons increases after dopamine depletion. The former is potentially caused by background neural activity–induced weak synaptic plasticity, and the latter is assumed to occur through post-depletion increase of neural activity in the pedunculopontine nucleus, where neurons representing obtained reward exist and presumably send excitatory projections to dopamine neurons. We further show that finding iv, which is nontrivial given the suggested distinct functions of the D1 and D2 corticostriatal pathways, can also be explained if we additionally assume a proposed mechanism of TD-RPE calculation, in which the D1 and D2 pathways encode the values of actions with a temporal difference. These results suggest a possible circuit mechanism for the involvements of dopamine in effort-related choices and, simultaneously, provide implications for the mechanisms of TD-RPE calculation.
Collapse
|
8
|
Ueno T, Nishijima H, Ueno S, Tomiyama M. Spine Enlargement of Pyramidal Tract-Type Neurons in the Motor Cortex of a Rat Model of Levodopa-Induced Dyskinesia. Front Neurosci 2017; 11:206. [PMID: 28450828 PMCID: PMC5390020 DOI: 10.3389/fnins.2017.00206] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 03/27/2017] [Indexed: 01/16/2023] Open
Abstract
Growing evidence suggests that abnormal synaptic plasticity of cortical neurons underlies levodopa-induced dyskinesia (LID) in Parkinson's disease (PD). Spine morphology reflects synaptic plasticity resulting from glutamatergic transmission. We previously reported that enlargement of the dendritic spines of intratelencephalic-type (IT) neurons in the primary motor cortex (M1) is linked to the development of LID. However, the relevance of another M1 neuron type, pyramidal-tract (PT) neurons, to LID remains unknown. We examined the morphological changes of the dendritic spines of M1 PT neurons in a rat model of LID. We quantified the density and size of these spines in 6-hydroxydopamine-lesioned rats (a model of PD), 6-hydroxydopamine-lesioned rats chronically treated with levodopa (a model of LID), and control rats chronically treated with levodopa. Dopaminergic denervation alone had no effect on spine density and head area. However, the LID model showed significant increases in the density and spine head area and the development of dyskinetic movements. In contrast, levodopa treatment of normal rats increased spine density alone. Although, chronic levodopa treatment increases PT neuron spine density, with or without dopaminergic denervation, enlargement of PT neuron spines appears to be a specific feature of LID. This finding suggests that PT neurons become hyperexcited in the LID model, in parallel with the enlargement of spines. Thus, spine enlargement, and the resultant hyperexcitability of PT pyramidal neurons, in the M1 cortex might contribute to abnormal cortical neuronal plasticity in LID.
Collapse
Affiliation(s)
- Tatsuya Ueno
- Department of Neurology, Aomori Prefectural Central HospitalAomori, Japan.,Department of Neurophysiology, Hirosaki University Graduate School of MedicineHirosaki, Japan
| | - Haruo Nishijima
- Department of Neurology, Aomori Prefectural Central HospitalAomori, Japan.,Department of Neurophysiology, Hirosaki University Graduate School of MedicineHirosaki, Japan
| | - Shinya Ueno
- Department of Neurophysiology, Hirosaki University Graduate School of MedicineHirosaki, Japan
| | - Masahiko Tomiyama
- Department of Neurology, Aomori Prefectural Central HospitalAomori, Japan.,Department of Neurophysiology, Hirosaki University Graduate School of MedicineHirosaki, Japan
| |
Collapse
|
9
|
Negwer M, Schubert D. Talking Convergence: Growing Evidence Links FOXP2 and Retinoic Acid in Shaping Speech-Related Motor Circuitry. Front Neurosci 2017; 11:19. [PMID: 28179876 PMCID: PMC5263127 DOI: 10.3389/fnins.2017.00019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 01/10/2017] [Indexed: 01/30/2023] Open
Affiliation(s)
- Moritz Negwer
- Max Planck Institute for PsycholinguisticsNijmegen, Netherlands
- Department of Cognitive Neuroscience, Radboud University Medical Center, Donders Institute for Brain, Cognition, and BehaviourNijmegen, Netherlands
| | - Dirk Schubert
- Department of Cognitive Neuroscience, Radboud University Medical Center, Donders Institute for Brain, Cognition, and BehaviourNijmegen, Netherlands
- *Correspondence: Dirk Schubert
| |
Collapse
|
10
|
Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016; 12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open
Abstract
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.
Collapse
Affiliation(s)
- Ayaka Kato
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
11
|
Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond. Behav Brain Res 2016; 311:110-121. [DOI: 10.1016/j.bbr.2016.05.017] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 05/02/2016] [Accepted: 05/06/2016] [Indexed: 01/20/2023]
|
12
|
Abstract
Unidirectional connections from the cortex to the matrix of the corpus striatum initiate the cortico-basal ganglia (BG)-thalamocortical loop, thought to be important in momentary action selection and in longer-term fine tuning of behavioural repertoire; a discrete set of striatal compartments, striosomes, has the complementary role of registering or anticipating reward that shapes corticostriatal plasticity. Re-entrant signals traversing the cortico-BG loop impact predominantly frontal cortices, conveyed through topographically ordered output channels; by contrast, striatal input signals originate from a far broader span of cortex, and are far more divergent in their termination. The term 'disclosed loop' is introduced to describe this organisation: a closed circuit that is open to outside influence at the initial stage of cortical input. The closed circuit component of corticostriatal afferents is newly dubbed 'operative', as it is proposed to establish the bid for action selection on the part of an incipient cortical action plan; the broader set of converging corticostriatal afferents is described as contextual. A corollary of this proposal is that every unit of the striatal volume, including the long, C-shaped tail of the caudate nucleus, should receive a mandatory component of operative input, and hence include at least one area of BG-recipient cortex amongst the sources of its corticostriatal afferents. Individual operative afferents contact twin classes of GABAergic striatal projection neuron (SPN), distinguished by their neurochemical character, and onward circuitry. This is the basis of the classic direct and indirect pathway model of the cortico-BG loop. Each pathway utilises a serial chain of inhibition, with two such links, or three, providing positive and negative feedback, respectively. Operative co-activation of direct and indirect SPNs is, therefore, pictured to simultaneously promote action, and to restrain it. The balance of this rival activity is determined by the contextual inputs, which summarise the external and internal sensory environment, and the state of ongoing behavioural priorities. Notably, the distributed sources of contextual convergence upon a striatal locus mirror the transcortical network harnessed by the origin of the operative input to that locus, thereby capturing a similar set of contingencies relevant to determining action. The disclosed loop formulation of corticostriatal and subsequent BG loop circuitry, as advanced here, refines the operating rationale of the classic model and allows the integration of more recent anatomical and physiological data, some of which can appear at variance with the classic model. Equally, it provides a lucid functional context for continuing cellular studies of SPN biophysics and mechanisms of synaptic plasticity.
Collapse
|
13
|
Morita K, Kawaguchi Y. Computing reward-prediction error: an integrated account of cortical timing and basal-ganglia pathways for appetitive and aversive learning. Eur J Neurosci 2015; 42:2003-21. [PMID: 26095906 PMCID: PMC5034842 DOI: 10.1111/ejn.12994] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 06/11/2015] [Accepted: 06/17/2015] [Indexed: 12/12/2022]
Abstract
There are two prevailing notions regarding the involvement of the corticobasal ganglia system in value‐based learning: (i) the direct and indirect pathways of the basal ganglia are crucial for appetitive and aversive learning, respectively, and (ii) the activity of midbrain dopamine neurons represents reward‐prediction error. Although (ii) constitutes a critical assumption of (i), it remains elusive how (ii) holds given (i), with the basal‐ganglia influence on the dopamine neurons. Here we present a computational neural‐circuit model that potentially resolves this issue. Based on the latest analyses of the heterogeneous corticostriatal neurons and connections, our model posits that the direct and indirect pathways, respectively, represent the values of upcoming and previous actions, and up‐regulate and down‐regulate the dopamine neurons via the basal‐ganglia output nuclei. This explains how the difference between the upcoming and previous values, which constitutes the core of reward‐prediction error, is calculated. Simultaneously, it predicts that blockade of the direct/indirect pathway causes a negative/positive shift of reward‐prediction error and thereby impairs learning from positive/negative error, i.e. appetitive/aversive learning. Through simulation of reward‐reversal learning and punishment‐avoidance learning, we show that our model could indeed account for the experimentally observed features that are suggested to support notion (i) and could also provide predictions on neural activity. We also present a behavioral prediction of our model, through simulation of inter‐temporal choice, on how the balance between the two pathways relates to the subject's time preference. These results indicate that our model, incorporating the heterogeneity of the cortical influence on the basal ganglia, is expected to provide a closed‐circuit mechanistic understanding of appetitive/aversive learning.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yasuo Kawaguchi
- Division of Cerebral Circuitry, National Institute for Physiological Sciences, Okazaki, Japan.,Department of Physiological Sciences, SOKENDAI (The Graduate University for Advanced Studies), Okazaki, Japan.,Japan Science and Technology Agency, Core Research for Evolutional Science and Technology, Tokyo, Japan
| |
Collapse
|
14
|
Deng Y, Lanciego J, Kerkerian-Le-Goff L, Coulon P, Salin P, Kachidian P, Lei W, Del Mar N, Reiner A. Differential organization of cortical inputs to striatal projection neurons of the matrix compartment in rats. Front Syst Neurosci 2015; 9:51. [PMID: 25926776 PMCID: PMC4396197 DOI: 10.3389/fnsys.2015.00051] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 03/12/2015] [Indexed: 11/13/2022] Open
Abstract
In prior studies, we described the differential organization of corticostriatal and thalamostriatal inputs to the spines of direct pathway (dSPNs) and indirect pathway striatal projection neurons (iSPNs) of the matrix compartment. In the present electron microscopic (EM) analysis, we have refined understanding of the relative amounts of cortical axospinous vs. axodendritic input to the two types of SPNs. Of note, we found that individual dSPNs receive about twice as many axospinous synaptic terminals from IT-type (intratelencephalically projecting) cortical neurons as they do from PT-type (pyramidal tract projecting) cortical neurons. We also found that PT-type axospinous synaptic terminals were about 1.5 times as common on individual iSPNs as IT-type axospinous synaptic terminals. Overall, a higher percentage of IT-type terminals contacted dSPN than iSPN spines, while a higher percentage of PT-type terminals contacted iSPN than dSPN spines. Notably, IT-type axospinous synaptic terminals were significantly larger on iSPN spines than on dSPN spines. By contrast to axospinous input, the axodendritic PT-type input to dSPNs was more substantial than that to iSPNs, and the axodendritic IT-type input appeared to be meager and comparable for both SPN types. The prominent axodendritic PT-type input to dSPNs may accentuate their PT-type responsiveness, and the large size of axospinous IT-type terminals on iSPNs may accentuate their IT-type responsiveness. Using transneuronal labeling with rabies virus to selectively label the cortical neurons with direct input to the dSPNs projecting to the substantia nigra pars reticulata, we found that the input predominantly arose from neurons in the upper layers of motor cortices, in which IT-type perikarya predominate. The differential cortical input to SPNs is likely to play key roles in motor control and motor learning.
Collapse
Affiliation(s)
- Yunping Deng
- Department of Anatomy and Neurobiology, The University of Tennessee Health Science Center Memphis, TN, USA
| | - Jose Lanciego
- Neurosciences Division, Center for Applied Medical Research (CIMA), Centro de Investigación Biomédica en Red sobre Enfermedades Neurosdegenerativas (CIBERNED), and Instituto de Investigación Sanitaria de Navarra (IdiSNA), University of Navarra Medical College Pamplona, Spain
| | | | - Patrice Coulon
- Aix Marseille Université, CNRS, INT UMR 7289 Marseille, France
| | - Pascal Salin
- Aix-Marseille Université, CNRS, IBDM UMR 7288 Marseille, France
| | | | - Wanlong Lei
- Department of Anatomy and Neurobiology, The University of Tennessee Health Science Center Memphis, TN, USA ; Department of Anatomy, Zhongshan Medical School of Sun Yat-Sen University Guangzhou, China
| | - Nobel Del Mar
- Department of Anatomy and Neurobiology, The University of Tennessee Health Science Center Memphis, TN, USA
| | - Anton Reiner
- Department of Anatomy and Neurobiology, The University of Tennessee Health Science Center Memphis, TN, USA
| |
Collapse
|
15
|
Morita K, Kato A. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front Neural Circuits 2014; 8:36. [PMID: 24782717 PMCID: PMC3988379 DOI: 10.3389/fncir.2014.00036] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 03/24/2014] [Indexed: 11/13/2022] Open
Abstract
It has been suggested that the midbrain dopamine (DA) neurons, receiving inputs from the cortico-basal ganglia (CBG) circuits and the brainstem, compute reward prediction error (RPE), the difference between reward obtained or expected to be obtained and reward that had been expected to be obtained. These reward expectations are suggested to be stored in the CBG synapses and updated according to RPE through synaptic plasticity, which is induced by released DA. These together constitute the "DA=RPE" hypothesis, which describes the mutual interaction between DA and the CBG circuits and serves as the primary working hypothesis in studying reward learning and value-based decision-making. However, recent work has revealed a new type of DA signal that appears not to represent RPE. Specifically, it has been found in a reward-associated maze task that striatal DA concentration primarily shows a gradual increase toward the goal. We explored whether such ramping DA could be explained by extending the "DA=RPE" hypothesis by taking into account biological properties of the CBG circuits. In particular, we examined effects of possible time-dependent decay of DA-dependent plastic changes of synaptic strengths by incorporating decay of learned values into the RPE-based reinforcement learning model and simulating reward learning tasks. We then found that incorporation of such a decay dramatically changes the model's behavior, causing gradual ramping of RPE. Moreover, we further incorporated magnitude-dependence of the rate of decay, which could potentially be in accord with some past observations, and found that near-sigmoidal ramping of RPE, resembling the observed DA ramping, could then occur. Given that synaptic decay can be useful for flexibly reversing and updating the learned reward associations, especially in case the baseline DA is low and encoding of negative RPE by DA is limited, the observed DA ramping would be indicative of the operation of such flexible reward learning.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo Tokyo, Japan
| | - Ayaka Kato
- Department of Biological Sciences, School of Science, The University of Tokyo Tokyo, Japan
| |
Collapse
|