1
|
Wärnberg E, Kumar A. Feasibility of dopamine as a vector-valued feedback signal in the basal ganglia. Proc Natl Acad Sci U S A 2023; 120:e2221994120. [PMID: 37527344 PMCID: PMC10410740 DOI: 10.1073/pnas.2221994120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/08/2023] [Indexed: 08/03/2023] Open
Abstract
It is well established that midbrain dopaminergic neurons support reinforcement learning (RL) in the basal ganglia by transmitting a reward prediction error (RPE) to the striatum. In particular, different computational models and experiments have shown that a striatum-wide RPE signal can support RL over a small discrete set of actions (e.g., no/no-go, choose left/right). However, there is accumulating evidence that the basal ganglia functions not as a selector between predefined actions but rather as a dynamical system with graded, continuous outputs. To reconcile this view with RL, there is a need to explain how dopamine could support learning of continuous outputs, rather than discrete action values. Inspired by the recent observations that besides RPE, the firing rates of midbrain dopaminergic neurons correlate with motor and cognitive variables, we propose a model in which dopamine signal in the striatum carries a vector-valued error feedback signal (a loss gradient) instead of a homogeneous scalar error (a loss). We implement a local, "three-factor" corticostriatal plasticity rule involving the presynaptic firing rate, a postsynaptic factor, and the unique dopamine concentration perceived by each striatal neuron. With this learning rule, we show that such a vector-valued feedback signal results in an increased capacity to learn a multidimensional series of real-valued outputs. Crucially, we demonstrate that this plasticity rule does not require precise nigrostriatal synapses but remains compatible with experimental observations of random placement of varicosities and diffuse volume transmission of dopamine.
Collapse
Affiliation(s)
- Emil Wärnberg
- Department of Neuroscience, Karolinska Institutet, 171 77Stockholm, Sweden
- Division of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28Stockholm, Sweden
| | - Arvind Kumar
- Division of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28Stockholm, Sweden
| |
Collapse
|
2
|
Phasic Dopamine Changes and Hebbian Mechanisms during Probabilistic Reversal Learning in Striatal Circuits: A Computational Study. Int J Mol Sci 2022; 23:ijms23073452. [PMID: 35408811 PMCID: PMC8998230 DOI: 10.3390/ijms23073452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/18/2022] [Accepted: 03/19/2022] [Indexed: 11/22/2022] Open
Abstract
Cognitive flexibility is essential to modify our behavior in a non-stationary environment and is often explored by reversal learning tasks. The basal ganglia (BG) dopaminergic system, under a top-down control of the pre-frontal cortex, is known to be involved in flexible action selection through reinforcement learning. However, how adaptive dopamine changes regulate this process and learning mechanisms for training the striatal synapses remain open questions. The current study uses a neurocomputational model of the BG, based on dopamine-dependent direct (Go) and indirect (NoGo) pathways, to investigate reinforcement learning in a probabilistic environment through a task that associates different stimuli to different actions. Here, we investigated: the efficacy of several versions of the Hebb rule, based on covariance between pre- and post-synaptic neurons, as well as the required control in phasic dopamine changes crucial to achieving a proper reversal learning. Furthermore, an original mechanism for modulating the phasic dopamine changes is proposed, assuming that the expected reward probability is coded by the activity of the winner Go neuron before a reward/punishment takes place. Simulations show that this original formulation for an automatic phasic dopamine control allows the achievement of a good flexible reversal even in difficult conditions. The current outcomes may contribute to understanding the mechanisms for active control of dopamine changes during flexible behavior. In perspective, it may be applied in neuropsychiatric or neurological disorders, such as Parkinson’s or schizophrenia, in which reinforcement learning is impaired.
Collapse
|
3
|
Grillner S, Robertson B, Kotaleski JH. Basal Ganglia—A Motion Perspective. Compr Physiol 2020; 10:1241-1275. [DOI: 10.1002/cphy.c190045] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
4
|
Wessel JR, Waller DA, Greenlee JD. Non-selective inhibition of inappropriate motor-tendencies during response-conflict by a fronto-subthalamic mechanism. eLife 2019; 8:42959. [PMID: 31063130 PMCID: PMC6533064 DOI: 10.7554/elife.42959] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 05/06/2019] [Indexed: 11/30/2022] Open
Abstract
To effectively interact with their environment, humans must often select actions from multiple incompatible options. Existing theories propose that during motoric response-conflict, inappropriate motor activity is actively (and perhaps non-selectively) suppressed by an inhibitory fronto-basal ganglia mechanism. We here tested this theory across three experiments. First, using scalp-EEG, we found that both outright action-stopping and response-conflict during action-selection invoke low-frequency activity of a common fronto-central source, whose activity relates to trial-by-trial behavioral indices of inhibition in both tasks. Second, using simultaneous intracranial recordings from the basal ganglia and motor cortex, we found that response-conflict increases the influence of the subthalamic nucleus on M1-representations of incorrect response-tendencies. Finally, using transcranial magnetic stimulation, we found that during the same time period when conflict-related STN-to-M1 communication is increased, cortico-spinal excitability is broadly suppressed. Together, these findings demonstrate that fronto-basal ganglia networks buttress action-selection under response-conflict by rapidly and non-selectively net-inhibiting inappropriate motor tendencies.
Collapse
Affiliation(s)
- Jan R Wessel
- Department of Neurology, University of Iowa Hospitals and Clinics, Iowa City, United States.,Department of Psychological and Brain Sciences, University of Iowa, Iowa City, United States
| | - Darcy A Waller
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, United States
| | - Jeremy Dw Greenlee
- Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, United States
| |
Collapse
|
5
|
Héricé C, Khalil R, Moftah M, Boraud T, Guthrie M, Garenne A. Decision making under uncertainty in a spiking neural network model of the basal ganglia. J Integr Neurosci 2016; 15:515-538. [PMID: 28002987 DOI: 10.1142/s021963521650028x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The mechanisms of decision-making and action selection are generally thought to be under the control of parallel cortico-subcortical loops connecting back to distinct areas of cortex through the basal ganglia and processing motor, cognitive and limbic modalities of decision-making. We have used these properties to develop and extend a connectionist model at a spiking neuron level based on a previous rate model approach. This model is demonstrated on decision-making tasks that have been studied in primates and the electrophysiology interpreted to show that the decision is made in two steps. To model this, we have used two parallel loops, each of which performs decision-making based on interactions between positive and negative feedback pathways. This model is able to perform two-level decision-making as in primates. We show here that, before learning, synaptic noise is sufficient to drive the decision-making process and that, after learning, the decision is based on the choice that has proven most likely to be rewarded. The model is then submitted to lesion tests, reversal learning and extinction protocols. We show that, under these conditions, it behaves in a consistent manner and provides predictions in accordance with observed experimental data.
Collapse
Affiliation(s)
- Charlotte Héricé
- * University de Bordeaux, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France.,† CNRS, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France
| | - Radwa Khalil
- † CNRS, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France
| | | | - Thomas Boraud
- * University de Bordeaux, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France.,† CNRS, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France
| | - Martin Guthrie
- * University de Bordeaux, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France.,† CNRS, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France
| | - André Garenne
- * University de Bordeaux, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France.,† CNRS, Institut des Maladies Neurodégénératives, UMR 5293, 33000 Bordeaux, France
| |
Collapse
|
6
|
Kurzawa N, Summerfield C, Bogacz R. Neural Circuits Trained with Standard Reinforcement Learning Can Accumulate Probabilistic Information during Decision Making. Neural Comput 2016; 29:368-393. [PMID: 27870610 DOI: 10.1162/neco_a_00917] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Much experimental evidence suggests that during decision making, neural circuits accumulate evidence supporting alternative options. A computational model well describing this accumulation for choices between two options assumes that the brain integrates the log ratios of the likelihoods of the sensory inputs given the two options. Several models have been proposed for how neural circuits can learn these log-likelihood ratios from experience, but all of these models introduced novel and specially dedicated synaptic plasticity rules. Here we show that for a certain wide class of tasks, the log-likelihood ratios are approximately linearly proportional to the expected rewards for selecting actions. Therefore, a simple model based on standard reinforcement learning rules is able to estimate the log-likelihood ratios from experience and on each trial accumulate the log-likelihood ratios associated with presented stimuli while selecting an action. The simulations of the model replicate experimental data on both behavior and neural activity in tasks requiring accumulation of probabilistic cues. Our results suggest that there is no need for the brain to support dedicated plasticity rules, as the standard mechanisms proposed to describe reinforcement learning can enable the neural circuits to perform efficient probabilistic inference.
Collapse
Affiliation(s)
- Nils Kurzawa
- Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford, OX1 3QT, U.K., and Institute of Pharmacy and Molecular Biotechnology, University of Heidelberg, D-69120 Heidelberg, Germany
| | | | - Rafal Bogacz
- Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford OX1 3UD, U.K., and Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX1 3UD, U.K.
| |
Collapse
|
7
|
Kato A, Morita K. Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation. PLoS Comput Biol 2016; 12:e1005145. [PMID: 27736881 PMCID: PMC5063413 DOI: 10.1371/journal.pcbi.1005145] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 09/14/2016] [Indexed: 12/12/2022] Open
Abstract
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of ‘Go’ or ‘No-Go’ selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of ‘Go’ values towards a goal, and (2) value-contrasts between ‘Go’ and ‘No-Go’ are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced. Dopamine (DA) has been suggested to have two reward-related roles: (1) representing reward-prediction-error (RPE), and (2) providing motivational drive. Role(1) is based on the physiological results that DA responds to unpredicted but not predicted reward, whereas role(2) is supported by the pharmacological results that blockade of DA signaling causes motivational impairments such as slowdown of self-paced behavior. So far, these two roles are considered to be played by two different temporal patterns of DA signals: role(1) by phasic signals and role(2) by tonic/sustained signals. However, recent studies have found sustained DA signals with features indicative of both roles (1) and (2), complicating this picture. Meanwhile, whereas synaptic/circuit mechanisms for role(1), i.e., how RPE is calculated in the upstream of DA neurons and how RPE-dependent update of learned-values occurs through DA-dependent synaptic plasticity, have now become clarified, mechanisms for role(2) remain unclear. In this work, we modeled self-paced behavior by a series of ‘Go’ or ‘No-Go’ selections in the framework of reinforcement-learning assuming DA's role(1), and demonstrated that incorporation of decay/forgetting of learned-values, which is presumably implemented as decay of synaptic strengths storing learned-values, provides a potential unified mechanistic account for the DA's two roles, together with its various temporal patterns.
Collapse
Affiliation(s)
- Ayaka Kato
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
8
|
Berthet P, Lindahl M, Tully PJ, Hellgren-Kotaleski J, Lansner A. Functional Relevance of Different Basal Ganglia Pathways Investigated in a Spiking Model with Reward Dependent Plasticity. Front Neural Circuits 2016; 10:53. [PMID: 27493625 PMCID: PMC4954853 DOI: 10.3389/fncir.2016.00053] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 07/06/2016] [Indexed: 11/13/2022] Open
Abstract
The brain enables animals to behaviorally adapt in order to survive in a complex and dynamic environment, but how reward-oriented behaviors are achieved and computed by its underlying neural circuitry is an open question. To address this concern, we have developed a spiking model of the basal ganglia (BG) that learns to dis-inhibit the action leading to a reward despite ongoing changes in the reward schedule. The architecture of the network features the two pathways commonly described in BG, the direct (denoted D1) and the indirect (denoted D2) pathway, as well as a loop involving striatum and the dopaminergic system. The activity of these dopaminergic neurons conveys the reward prediction error (RPE), which determines the magnitude of synaptic plasticity within the different pathways. All plastic connections implement a versatile four-factor learning rule derived from Bayesian inference that depends upon pre- and post-synaptic activity, receptor type, and dopamine level. Synaptic weight updates occur in the D1 or D2 pathways depending on the sign of the RPE, and an efference copy informs upstream nuclei about the action selected. We demonstrate successful performance of the system in a multiple-choice learning task with a transiently changing reward schedule. We simulate lesioning of the various pathways and show that a condition without the D2 pathway fares worse than one without D1. Additionally, we simulate the degeneration observed in Parkinson's disease (PD) by decreasing the number of dopaminergic neurons during learning. The results suggest that the D1 pathway impairment in PD might have been overlooked. Furthermore, an analysis of the alterations in the synaptic weights shows that using the absolute reward value instead of the RPE leads to a larger change in D1.
Collapse
Affiliation(s)
- Pierre Berthet
- Numerical Analysis and Computer Science, Stockholm UniversityStockholm, Sweden
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
| | - Mikael Lindahl
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
| | - Philip J. Tully
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
- Institute for Adaptive and Neural Computation, School of Informatics, University of EdinburghEdinburgh, UK
| | - Jeanette Hellgren-Kotaleski
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
- Department of Neuroscience, Karolinska InstituteStockholm, Sweden
| | - Anders Lansner
- Numerical Analysis and Computer Science, Stockholm UniversityStockholm, Sweden
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of TechnologyStockholm, Sweden
- Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden
| |
Collapse
|
9
|
Knight JC, Tully PJ, Kaplan BA, Lansner A, Furber SB. Large-Scale Simulations of Plastic Neural Networks on Neuromorphic Hardware. Front Neuroanat 2016; 10:37. [PMID: 27092061 PMCID: PMC4823276 DOI: 10.3389/fnana.2016.00037] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 03/18/2016] [Indexed: 11/17/2022] Open
Abstract
SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 2.0 × 104 neurons and 5.1 × 107 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately 45× more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models.
Collapse
Affiliation(s)
- James C Knight
- Advanced Processor Technologies Group, School of Computer Science, University of Manchester Manchester, UK
| | - Philip J Tully
- Department of Computational Biology, Royal Institute of TechnologyStockholm, Sweden; Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden; Institute for Adaptive and Neural Computation, School of Informatics, University of EdinburghEdinburgh, UK
| | - Bernhard A Kaplan
- Department of Visualization and Data Analysis, Zuse Institute Berlin Berlin, Germany
| | - Anders Lansner
- Department of Computational Biology, Royal Institute of TechnologyStockholm, Sweden; Stockholm Brain Institute, Karolinska InstituteStockholm, Sweden; Department of Numerical analysis and Computer Science, Stockholm UniversityStockholm, Sweden
| | - Steve B Furber
- Advanced Processor Technologies Group, School of Computer Science, University of Manchester Manchester, UK
| |
Collapse
|
10
|
Vogginger B, Schüffny R, Lansner A, Cederström L, Partzsch J, Höppner S. Reducing the computational footprint for real-time BCPNN learning. Front Neurosci 2015; 9:2. [PMID: 25657618 PMCID: PMC4302947 DOI: 10.3389/fnins.2015.00002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 01/03/2015] [Indexed: 11/26/2022] Open
Abstract
The implementation of synaptic plasticity in neural simulation or neuromorphic hardware is usually very resource-intensive, often requiring a compromise between efficiency and flexibility. A versatile, but computationally-expensive plasticity mechanism is provided by the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm. Building upon Bayesian statistics, and having clear links to biological plasticity processes, the BCPNN learning rule has been applied in many fields, ranging from data classification, associative memory, reward-based learning, probabilistic inference to cortical attractor memory networks. In the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-pass-filtering stages, requiring a total of eight state variables, whose dynamics are typically simulated with the fixed step size Euler method. We derive analytic solutions allowing an efficient event-driven implementation of this learning rule. Further speedup is achieved by first rewriting the model which reduces the number of basic arithmetic operations per update to one half, and second by using look-up tables for the frequently calculated exponential decay. Ultimately, in a typical use case, the simulation using our approach is more than one order of magnitude faster than with the fixed step size Euler method. Aiming for a small memory footprint per BCPNN synapse, we also evaluate the use of fixed-point numbers for the state variables, and assess the number of bits required to achieve same or better accuracy than with the conventional explicit Euler method. All of this will allow a real-time simulation of a reduced cortex model based on BCPNN in high performance computing. More important, with the analytic solution at hand and due to the reduced memory bandwidth, the learning rule can be efficiently implemented in dedicated or existing digital neuromorphic hardware.
Collapse
Affiliation(s)
- Bernhard Vogginger
- Department of Electrical Engineering and Information Technology, Technische Universität Dresden Germany
| | - René Schüffny
- Department of Electrical Engineering and Information Technology, Technische Universität Dresden Germany
| | - Anders Lansner
- Department of Computational Biology, School of Computer Science and Communication, Royal Institute of Technology (KTH) Stockholm, Sweden ; Department of Numerical Analysis and Computer Science, Stockholm University Stockholm, Sweden
| | - Love Cederström
- Department of Electrical Engineering and Information Technology, Technische Universität Dresden Germany
| | - Johannes Partzsch
- Department of Electrical Engineering and Information Technology, Technische Universität Dresden Germany
| | - Sebastian Höppner
- Department of Electrical Engineering and Information Technology, Technische Universität Dresden Germany
| |
Collapse
|
11
|
Tully PJ, Hennig MH, Lansner A. Synaptic and nonsynaptic plasticity approximating probabilistic inference. Front Synaptic Neurosci 2014; 6:8. [PMID: 24782758 PMCID: PMC3986567 DOI: 10.3389/fnsyn.2014.00008] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 03/20/2014] [Indexed: 12/28/2022] Open
Abstract
Learning and memory operations in neural circuits are believed to involve molecular cascades of synaptic and nonsynaptic changes that lead to a diverse repertoire of dynamical phenomena at higher levels of processing. Hebbian and homeostatic plasticity, neuromodulation, and intrinsic excitability all conspire to form and maintain memories. But it is still unclear how these seemingly redundant mechanisms could jointly orchestrate learning in a more unified system. To this end, a Hebbian learning rule for spiking neurons inspired by Bayesian statistics is proposed. In this model, synaptic weights and intrinsic currents are adapted on-line upon arrival of single spikes, which initiate a cascade of temporally interacting memory traces that locally estimate probabilities associated with relative neuronal activation levels. Trace dynamics enable synaptic learning to readily demonstrate a spike-timing dependence, stably return to a set-point over long time scales, and remain competitive despite this stability. Beyond unsupervised learning, linking the traces with an external plasticity-modulating signal enables spike-based reinforcement learning. At the postsynaptic neuron, the traces are represented by an activity-dependent ion channel that is shown to regulate the input received by a postsynaptic cell and generate intrinsic graded persistent firing levels. We show how spike-based Hebbian-Bayesian learning can be performed in a simulated inference task using integrate-and-fire (IAF) neurons that are Poisson-firing and background-driven, similar to the preferred regime of cortical neurons. Our results support the view that neurons can represent information in the form of probability distributions, and that probabilistic inference could be a functional by-product of coupled synaptic and nonsynaptic mechanisms operating over several timescales. The model provides a biophysical realization of Bayesian computation by reconciling several observed neural phenomena whose functional effects are only partially understood in concert.
Collapse
Affiliation(s)
- Philip J Tully
- Department of Computational Biology, Royal Institute of Technology (KTH) Stockholm, Sweden ; Stockholm Brain Institute, Karolinska Institute Stockholm, Sweden ; School of Informatics, Institute for Adaptive and Neural Computation, University of Edinburgh Edinburgh, UK
| | - Matthias H Hennig
- School of Informatics, Institute for Adaptive and Neural Computation, University of Edinburgh Edinburgh, UK
| | - Anders Lansner
- Department of Computational Biology, Royal Institute of Technology (KTH) Stockholm, Sweden ; Stockholm Brain Institute, Karolinska Institute Stockholm, Sweden ; Department of Numerical Analysis and Computer Science, Stockholm University Stockholm, Sweden
| |
Collapse
|
12
|
Berthet P, Lansner A. Optogenetic stimulation in a computational model of the basal ganglia biases action selection and reward prediction error. PLoS One 2014; 9:e90578. [PMID: 24614169 PMCID: PMC3948624 DOI: 10.1371/journal.pone.0090578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Accepted: 02/03/2014] [Indexed: 11/30/2022] Open
Abstract
Optogenetic stimulation of specific types of medium spiny neurons (MSNs) in the striatum has been shown to bias the selection of mice in a two choices task. This shift is dependent on the localisation and on the intensity of the stimulation but also on the recent reward history. We have implemented a way to simulate this increased activity produced by the optical flash in our computational model of the basal ganglia (BG). This abstract model features the direct and indirect pathways commonly described in biology, and a reward prediction pathway (RP). The framework is similar to Actor-Critic methods and to the ventral/dorsal distinction in the striatum. We thus investigated the impact on the selection caused by an added stimulation in each of the three pathways. We were able to reproduce in our model the bias in action selection observed in mice. Our results also showed that biasing the reward prediction is sufficient to create a modification in the action selection. However, we had to increase the percentage of trials with stimulation relative to that in experiments in order to impact the selection. We found that increasing only the reward prediction had a different effect if the stimulation in RP was action dependent (only for a specific action) or not. We further looked at the evolution of the change in the weights depending on the stage of learning within a block. A bias in RP impacts the plasticity differently depending on that stage but also on the outcome. It remains to experimentally test how the dopaminergic neurons are affected by specific stimulations of neurons in the striatum and to relate data to predictions of our model.
Collapse
Affiliation(s)
- Pierre Berthet
- Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
- Stockholm Brain Institute, Karolinska Institute, Stockholm, Sweden
| | - Anders Lansner
- Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
- Stockholm Brain Institute, Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
13
|
Parallel dopamine D1 receptor activity dependence of l-Dopa-induced normal movement and dyskinesia in mice. Neuroscience 2013; 236:66-76. [PMID: 23357114 DOI: 10.1016/j.neuroscience.2012.12.065] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Revised: 12/04/2012] [Accepted: 12/17/2012] [Indexed: 11/20/2022]
Abstract
l-3,4-Dihydroxyphenylalanine (l-Dopa)-induced dyskinesia (LID) in Parkinson's disease (PD) is a major clinical problem. The prevailing view is that in PD patients and animal PD models dyskinesia develops after repeated l-dopa use or priming, independent of l-dopa's anti-PD therapeutic effect that occurs immediately. Here we show that in mice with severe and consistent dopamine (DA) loss in the dorsal striatum, rendered by transcription factor Pitx3 null mutation, the very first injection of l-dopa or D1-like agonist SKF81297 induced both normal ambulatory and dyskinetic movements. Furthermore, the robust stimulating effects on normal and dyskinetic movements had an identical time course and parallel dose-response curves. In contrast, D2-like agonist ropinirole stimulated normal and dyskinetic movements relatively modestly. These results demonstrate that severe DA loss in the dorsal striatum sets the stage for dyskinesia to occur on the first exposure to l-dopa or a D1 agonist without any priming. These results also indicate that l-dopa stimulated both normal and dyskinetic movements primarily via D1 receptor activation and that proper D1 agonism is potentially an efficacious therapy for PD motor deficits.
Collapse
|