1
|
Wallace CW, Holleran KM, Slinkard CY, Centanni SW, Lapish CC, Jones SR. Kappa opioid receptors diminish spontaneous dopamine signals in awake mice through multiple mechanisms. Neuropharmacology 2025; 273:110458. [PMID: 40204058 DOI: 10.1016/j.neuropharm.2025.110458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 03/06/2025] [Accepted: 04/03/2025] [Indexed: 04/11/2025]
Abstract
The role of the dynorphin/kappa opioid receptor (KOR) system in dopamine (DA) regulation has been extensively investigated. KOR activation reduces extracellular DA concentrations, but the exact mechanism(s) through which this is accomplished are not fully elucidated. To explore KOR influences on real-time DA fluctuations, we used the photosensor dLight1.2 with fiber photometry in the nucleus accumbens (NAc) core of freely moving male and female C57BL/6J mice. First, we established that the rise and fall of spontaneously arising DA signals were due to DA release and reuptake, respectively. Next, mice were systemically administered the KOR agonist U50,488H in the presence or absence of the KOR antagonist aticaprant. U50,488H reduced both the amplitude and width of spontaneous signals in both sexes. Further, the slope of the correlation between amplitude and width was increased, indicating that DA uptake rates were increased. U50,488H also reduced the frequency of occurrence of signals in males and females. The effects of KOR activation were stronger in males, while effects of KOR antagonism were stronger in females. Overall, KORs exerted significant inhibitory control over spontaneous DA signaling, acting through at least three mechanisms - inhibiting DA release, promoting DA transporter-mediated uptake, and reducing the frequency of signals.
Collapse
Affiliation(s)
- Conner W Wallace
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Katherine M Holleran
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Clare Y Slinkard
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Samuel W Centanni
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Christopher C Lapish
- Department of Anatomy, Cell Biology, and Physiology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Sara R Jones
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
| |
Collapse
|
2
|
Shin S, Oh J, Kim SK, Lee YS, Kim SJ. Quantitative dynamics of neural uncertainty in sensory processing and decision-making during discriminative learning. Exp Mol Med 2025:10.1038/s12276-025-01456-7. [PMID: 40335633 DOI: 10.1038/s12276-025-01456-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 02/19/2025] [Accepted: 02/21/2025] [Indexed: 05/09/2025] Open
Abstract
Uncertainty is crucial in sensory processing, necessitating further quantitative research on its neural representation in the sensory cortex. Here, to address this need, we used a deep learning approach to quantify uncertainties in neural activity from the forelimb area of the primary somatosensory cortex (fS1) during a vibration frequency discrimination task, introducing a transformer model designed to decode neural data not consistently tracked over time. Our model shows that the neural representation of fS1 encodes uncertainties not only from vibratory stimuli but also from decision-making processes, emphasizing its crucial role across various biological contexts. We confirmed that uncertainty decreases as learning progresses and increases with interruptions in learning. In line with predictions from previous studies, we also observed that uncertainty is high at psychometric thresholds. Furthermore, high uncertainty correlates with incorrect decisions, and we have identified dynamics in uncertainty between previous and current trials. Such findings underscore the evolving role of fS1 in assessing uncertainty for the brain's downstream areas as learning progresses.
Collapse
Affiliation(s)
- Soonho Shin
- Department of Physiology, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea
- Memory Network Medical Research Center, Neuroscience Research Institute, Wide River Institute of Immunology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Joonsu Oh
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Sun Kwang Kim
- Department of Physiology, College of Korean Medicine, Kyung Hee University, Seoul, Republic of Korea
| | - Yong-Seok Lee
- Department of Physiology, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea
- Memory Network Medical Research Center, Neuroscience Research Institute, Wide River Institute of Immunology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Sang Jeong Kim
- Department of Physiology, Seoul National University College of Medicine, Seoul, Republic of Korea.
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea.
- Memory Network Medical Research Center, Neuroscience Research Institute, Wide River Institute of Immunology, Seoul National University College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
3
|
Zhang Z, Costa KM, Langdon AJ, Schoenbaum G. The devilish details affecting TDRL models in dopamine research. Trends Cogn Sci 2025; 29:434-447. [PMID: 40016003 PMCID: PMC12058390 DOI: 10.1016/j.tics.2025.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 01/28/2025] [Accepted: 02/03/2025] [Indexed: 03/01/2025]
Abstract
Over recent decades, temporal difference reinforcement learning (TDRL) models have successfully explained much dopamine (DA) activity. This success has invited heightened scrutiny of late, with many studies challenging the validity of TDRL models of DA function. Yet, when evaluating the validity of these models, the devil is truly in the details. TDRL is a broad class of algorithms sharing core ideas but differing greatly in implementation and predictions. Thus, it is important to identify the defining aspects of the TDRL framework being tested and to use state spaces and model architectures that capture the known complexity of the behavioral representations and neural systems involved. Here, we discuss several examples that illustrate the importance of these considerations.
Collapse
Affiliation(s)
- Zhewei Zhang
- National Institute on Drug Abuse Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA.
| | - Kauê M Costa
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, USA
| | - Angela J Langdon
- National Institute of Mental Health Intramural Research Program, National Institutes of Health, Bethesda, MD 20892, USA
| | - Geoffrey Schoenbaum
- National Institute on Drug Abuse Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA.
| |
Collapse
|
4
|
Bornhoft KN, Prohofsky J, O’Neal TJ, Wolff AR, Saunders BT. Striatal dopamine represents valence on dynamic regional scales. J Neurosci 2025; 45:e1551242025. [PMID: 40097183 PMCID: PMC12019117 DOI: 10.1523/jneurosci.1551-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 01/31/2025] [Accepted: 03/11/2025] [Indexed: 03/19/2025] Open
Abstract
Adaptive decision making relies on dynamic updating of learned associations where environmental cues come to predict valenced stimuli, such as food or threat. Cue-guided behavior depends on a network of brain systems, including dopaminergic projections to the striatum. Critically, it remains unclear how dopamine signaling across the striatum encodes multi-valent, dynamic learning contexts, where positive and negative associations must be rapidly disambiguated. To understand this, we employed a Pavlovian discrimination paradigm, where cues predicting food or threat were intermingled during conditioning sessions, and their meaning was serially reversed across training. We found that male and female rats readily distinguished these cues and updated their behavior rapidly upon valence reversal. Using fiber photometry, we recorded dopamine signaling in three major striatal subregions - the dorsolateral striatum (DLS), the nucleus accumbens (NAc) core, and the nucleus accumbens medial shell - finding that valence was represented uniquely across all three regions, indicative of local signals biased for value and salience. Further, ambiguity introduced by cue reversals reshaped striatal dopamine on different timelines: nucleus accumbens signals updated more readily than those in the DLS. Together, these results indicate that striatal dopamine flexibly encodes stimulus valence according to region-specific rules, and these signals are dynamically modulated by changing contingencies in the resolution of ambiguity about the meaning of environmental cues.Significance Statement Adaptive decision making relies on updating learned associations to disambiguate predictions of reward or threat. This cue-guided behavior depends on striatal dopamine, but it remains unclear how dopamine signaling encodes multi-valent, dynamic learning contexts. Here, we employed a paradigm where cues predicting positive and negative outcomes were intermingled, and their meaning was serially reversed across time. We recorded dopamine signaling, finding heterogeneous patterns of valence encoding across striatal subregions, and cue reversal reshaped subregional signals on different timelines. Our results suggest that dopamine flexibly encodes dynamic learning contexts to resolve ambiguity about the meaning of environmental cues.
Collapse
Affiliation(s)
- Kaisa N. Bornhoft
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota 55455
- Medical Discovery Team on Addiction, University of Minnesota, Minneapolis, Minnesota 55455
| | - Julianna Prohofsky
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota 55455
- Medical Discovery Team on Addiction, University of Minnesota, Minneapolis, Minnesota 55455
| | - Timothy J. O’Neal
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota 55455
- Medical Discovery Team on Addiction, University of Minnesota, Minneapolis, Minnesota 55455
| | - Amy R. Wolff
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota 55455
- Medical Discovery Team on Addiction, University of Minnesota, Minneapolis, Minnesota 55455
| | - Benjamin T. Saunders
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota 55455
- Medical Discovery Team on Addiction, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
5
|
Wallace CW, Slinkard CY, Shaughnessy R, Holleran KM, Centanni SW, Lapish CC, Jones SR. Fiber photometry analysis of spontaneous dopamine signals: The z-scored data are not the data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.19.639080. [PMID: 40060421 PMCID: PMC11888193 DOI: 10.1101/2025.02.19.639080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/17/2025]
Abstract
Fluorescent sensors have revolutionized the measurement of molecules in the brain, and the dLight dopamine sensor has been used extensively to examine reward- and cue-evoked dopamine release, but only recently has the field turned its attention to spontaneous release events. Analysis of spontaneous events typically requires evaluation of hundreds of events over minutes to hours, and the most common method of analysis, z-scoring, was not designed for this purpose. Here, we compare the accuracy and reliability of three different analysis methods to identify pharmacologically induced changes in dopamine release and uptake in freely moving C57BL/6J mice. The D1-like receptor antagonist SCH23390 was used to prevent dLight sensors from interacting with dopamine in the extracellular space, while cocaine was used to inhibit uptake and raclopride to increase release of dopamine in the nucleus accumbens. We examined peak-to-peak frequency, peak amplitude, and width, the time spent above an established cutoff. The three methods were 1) the widely-used "Z-Score Method", which automatically smooths baseline drift and normalizes recordings using signal-to-noise ratios, 2) a "Manual Method", in which local baselines were adjusted manually and individual cutoffs were determined for each subject, and 3) the "Prominence Method" that combines z-scoring with prominence assessment to tag individual peaks, then returns to the preprocessed data for kinetic analysis. First, SCH23390 drastically reduced the number of signals detected as expected, but only when the Manual Method was used. Z-scoring failed to identify any changes, due to its amplification of noise when signals were diminished. Cocaine increased signal width as expected using the Manual and Prominence Methods, but not the Z-Score Method. Finally, raclopride-induced increases in amplitude were correctly identified by the Manual and Prominence Methods. The Z-Score Method failed to identify any of the changes in dopamine release and uptake kinetics. Thus, analysis of spontaneous dopamine signals requires assessment of the %ΔF/F values, ideally using the Manual Method, and the use of z-scoring is not appropriate.
Collapse
Affiliation(s)
- Conner W Wallace
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC
| | - Clare Y Slinkard
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC
| | | | - Katherine M Holleran
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC
| | - Samuel W Centanni
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC
| | - Christopher C Lapish
- Department of Anatomy, Cell Biology, and Physiology, Indiana University School of Medicine, Indianapolis, IN
| | - Sara R Jones
- Department of Translational Neuroscience, Wake Forest University School of Medicine, Winston-Salem, NC
| |
Collapse
|
6
|
Patel D, Siegelmann HT. Navigating the unknown: Leveraging self-information and diversity in partially observable environments. Biochem Biophys Res Commun 2024; 741:150923. [PMID: 39579529 DOI: 10.1016/j.bbrc.2024.150923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 07/17/2024] [Accepted: 10/28/2024] [Indexed: 11/25/2024]
Abstract
Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.
Collapse
Affiliation(s)
- Devdhar Patel
- Manning College of Information and Computer Science, University of Massachusetts, Amherst, MA, 01003, USA.
| | - Hava T Siegelmann
- Manning College of Information and Computer Science, University of Massachusetts, Amherst, MA, 01003, USA
| |
Collapse
|
7
|
Ursino M, Pelle S, Nekka F, Robaey P, Schirru M. Valence-dependent dopaminergic modulation during reversal learning in Parkinson's disease: A neurocomputational approach. Neurobiol Learn Mem 2024; 215:107985. [PMID: 39270814 DOI: 10.1016/j.nlm.2024.107985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 08/19/2024] [Accepted: 09/06/2024] [Indexed: 09/15/2024]
Abstract
Reinforcement learning, crucial for behavior in dynamic environments, is driven by rewards and punishments, modulated by dopamine (DA) changes. This study explores the dopaminergic system's influence on learning, particularly in Parkinson's disease (PD), where medication leads to impaired adaptability. Highlighting the role of tonic DA in signaling the valence of actions, this research investigates how DA affects response vigor and decision-making in PD. DA not only influences reward and punishment learning but also indicates the cognitive effort level and risk propensity in actions, which are essential for understanding and managing PD symptoms. In this work, we adapt our existing neurocomputational model of basal ganglia (BG) to simulate two reversal learning tasks proposed by Cools et al. We first optimized a Hebb rule for both probabilistic and deterministic reversal learning, conducted a sensitivity analysis (SA) on parameters related to DA effect, and compared performances between three groups: PD-ON, PD-OFF, and control subjects. In our deterministic task simulation, we explored switch error rates after unexpected task switches and found a U-shaped relationship between tonic DA levels and switch error frequency. Through SA, we classify these three groups. Then, assuming that the valence of the stimulus affects the tonic levels of DA, we were able to reproduce the results by Cools et al. As for the probabilistic task simulation, our results are in line with clinical data, showing similar trends with PD-ON, characterized by higher tonic DA levels that are correlated with increased difficulty in both acquisition and reversal tasks. Our study proposes a new hypothesis: valence, signaled by tonic DA levels, influences learning in PD, confirming the uncorrelation between phasic and tonic DA changes. This hypothesis challenges existing paradigms and opens new avenues for understanding cognitive processes in PD, particularly in reversal learning tasks.
Collapse
Affiliation(s)
- Mauro Ursino
- Department of Electrical, Electronic and Information Engineering Guglielmo Marconi, University of Bologna, Campus of Cesena, I 47521 Cesena, Italy.
| | - Silvana Pelle
- Department of Electrical, Electronic and Information Engineering Guglielmo Marconi, University of Bologna, Campus of Cesena, I 47521 Cesena, Italy.
| | - Fahima Nekka
- Faculté de Pharmacie, Université de Montréal, Montreal, Quebec H3T 1J4, Canada; Centre de recherches mathématiques, Université de Montréal, Montreal, Quebec H3T 1J4, Canada; Centre for Applied Mathematics in Bioscience and Medicine (CAMBAM), McGill University, Montreal, Quebec H3G 1Y6, Canada.
| | - Philippe Robaey
- Children's Hospital of Eastern Ontario, University of Ottawa, Ottawa, ON, Canada.
| | - Miriam Schirru
- Department of Electrical, Electronic and Information Engineering Guglielmo Marconi, University of Bologna, Campus of Cesena, I 47521 Cesena, Italy; Faculté de Pharmacie, Université de Montréal, Montreal, Quebec H3T 1J4, Canada.
| |
Collapse
|
8
|
Kim MJ, Gibson DJ, Hu D, Yoshida T, Hueske E, Matsushima A, Mahar A, Schofield CJ, Sompolpong P, Tran KT, Tian L, Graybiel AM. Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations. Nat Commun 2024; 15:8856. [PMID: 39402067 PMCID: PMC11473536 DOI: 10.1038/s41467-024-53176-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/03/2024] [Indexed: 10/17/2024] Open
Abstract
We recorded dopamine release signals in centromedial and centrolateral sectors of the striatum as mice learned consecutive versions of visual cue-outcome conditioning tasks. Dopamine release responses differed for the centromedial and centrolateral sites. In neither sector could these be accounted for by classic reinforcement learning alone as classically applied to the activity of nigral dopamine-containing neurons. Medially, cue responses ranged from initial sharp peaks to modulated plateau responses; outcome (reward) responses during cue conditioning were minimal or, initially, negative. At centrolateral sites, by contrast, strong, transient dopamine release responses occurred at both cue and outcome. Prolonged, plateau release responses to cues emerged in both regions when discriminative behavioral responses became required. At most sites, we found no evidence for a transition from outcome signaling to cue signaling, a hallmark of temporal difference reinforcement learning as applied to midbrain dopaminergic neuronal activity. These findings delineate a reshaping of striatal dopamine release activity during learning and suggest that current views of reward prediction error encoding need review to accommodate distinct learning-related spatial and temporal patterns of striatal dopamine release in the dorsal striatum.
Collapse
Affiliation(s)
- Min Jung Kim
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
- Advanced Imaging Research Center, University of Texas, Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Daniel J Gibson
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Dan Hu
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Tomoko Yoshida
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Emily Hueske
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Ayano Matsushima
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Ara Mahar
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Cynthia J Schofield
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Patlapa Sompolpong
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Kathy T Tran
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA
| | - Lin Tian
- Max Planck Florida Institute for Neuroscience, Jupiter, FL, 33458, USA
| | - Ann M Graybiel
- McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 43 Vassar St., Cambridge, MA, 02139, USA.
| |
Collapse
|
9
|
Cazalé-Debat L, Scheunemann L, Day M, Fernandez-D V Alquicira T, Dimtsi A, Zhang Y, Blackburn LA, Ballardini C, Greenin-Whitehead K, Reynolds E, Lin AC, Owald D, Rezaval C. Mating proximity blinds threat perception. Nature 2024; 634:635-643. [PMID: 39198656 PMCID: PMC11485238 DOI: 10.1038/s41586-024-07890-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 07/31/2024] [Indexed: 09/01/2024]
Abstract
Romantic engagement can bias sensory perception. This 'love blindness' reflects a common behavioural principle across organisms: favouring pursuit of a coveted reward over potential risks1. In the case of animal courtship, such sensory biases may support reproductive success but can also expose individuals to danger, such as predation2,3. However, how neural networks balance the trade-off between risk and reward is unknown. Here we discover a dopamine-governed filter mechanism in male Drosophila that reduces threat perception as courtship progresses. We show that during early courtship stages, threat-activated visual neurons inhibit central courtship nodes via specific serotonergic neurons. This serotonergic inhibition prompts flies to abort courtship when they see imminent danger. However, as flies advance in the courtship process, the dopaminergic filter system reduces visual threat responses, shifting the balance from survival to mating. By recording neural activity from males as they approach mating, we demonstrate that progress in courtship is registered as dopaminergic activity levels ramping up. This dopamine signalling inhibits the visual threat detection pathway via Dop2R receptors, allowing male flies to focus on courtship when they are close to copulation. Thus, dopamine signalling biases sensory perception based on perceived goal proximity, to prioritize between competing behaviours.
Collapse
Affiliation(s)
- Laurie Cazalé-Debat
- School of Biosciences, University of Birmingham, Birmingham, UK
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK
| | - Lisa Scheunemann
- Freie Universität Berlin, Institute of Biology, Berlin, Germany
- Institut für Neurophysiologie and NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Megan Day
- School of Biosciences, University of Birmingham, Birmingham, UK
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK
| | - Tania Fernandez-D V Alquicira
- Institut für Neurophysiologie and NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Anna Dimtsi
- School of Biosciences, University of Birmingham, Birmingham, UK
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Youchong Zhang
- School of Biosciences, University of Birmingham, Birmingham, UK
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK
- Centre for Neural Circuits and Behaviour, University of Oxford, Oxford, UK
| | - Lauren A Blackburn
- School of Biosciences, University of Birmingham, Birmingham, UK
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK
- School of Science and the Environment, University of Worcester, Worcester, UK
| | - Charles Ballardini
- School of Biosciences, University of Birmingham, Birmingham, UK
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK
| | - Katie Greenin-Whitehead
- School of Biosciences, University of Sheffield, Sheffield, UK
- Neuroscience Institute, University of Sheffield, Sheffield, UK
| | - Eric Reynolds
- Institut für Neurophysiologie and NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Andrew C Lin
- School of Biosciences, University of Sheffield, Sheffield, UK
- Neuroscience Institute, University of Sheffield, Sheffield, UK
| | - David Owald
- Institut für Neurophysiologie and NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Carolina Rezaval
- School of Biosciences, University of Birmingham, Birmingham, UK.
- Birmingham Centre for Neurogenetics, University of Birmingham, Birmingham, UK.
| |
Collapse
|
10
|
Gershman SJ, Assad JA, Datta SR, Linderman SW, Sabatini BL, Uchida N, Wilbrecht L. Explaining dopamine through prediction errors and beyond. Nat Neurosci 2024; 27:1645-1655. [PMID: 39054370 DOI: 10.1038/s41593-024-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 06/13/2024] [Indexed: 07/27/2024]
Abstract
The most influential account of phasic dopamine holds that it reports reward prediction errors (RPEs). The RPE-based interpretation of dopamine signaling is, in its original form, probably too simple and fails to explain all the properties of phasic dopamine observed in behaving animals. This Perspective helps to resolve some of the conflicting interpretations of dopamine that currently exist in the literature. We focus on the following three empirical challenges to the RPE theory of dopamine: why does dopamine (1) ramp up as animals approach rewards, (2) respond to sensory and motor features and (3) influence action selection? We argue that the prediction error concept, once it has been suitably modified and generalized based on an analysis of each computational problem, answers each challenge. Nonetheless, there are a number of additional empirical findings that appear to demand fundamentally different theoretical explanations beyond encoding RPE. Therefore, looking forward, we discuss the prospects for a unifying theory that respects the diversity of dopamine signaling and function as well as the complex circuitry that both underlies and responds to dopaminergic transmission.
Collapse
Affiliation(s)
- Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| | - John A Assad
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | | | - Scott W Linderman
- Department of Statistics and Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
| | - Bernardo L Sabatini
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Linda Wilbrecht
- Department of Psychology and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| |
Collapse
|
11
|
Basu A, Yang JH, Yu A, Glaeser-Khan S, Rondeau JA, Feng J, Krystal JH, Li Y, Kaye AP. Frontal Norepinephrine Represents a Threat Prediction Error Under Uncertainty. Biol Psychiatry 2024; 96:256-267. [PMID: 38316333 PMCID: PMC11269024 DOI: 10.1016/j.biopsych.2024.01.025] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 01/19/2024] [Accepted: 01/29/2024] [Indexed: 02/07/2024]
Abstract
BACKGROUND To adapt to threats in the environment, animals must predict them and engage in defensive behavior. While the representation of a prediction error signal for reward has been linked to dopamine, a neuromodulatory prediction error for aversive learning has not been identified. METHODS We measured and manipulated norepinephrine release during threat learning using optogenetics and a novel fluorescent norepinephrine sensor. RESULTS We found that norepinephrine response to conditioned stimuli reflects aversive memory strength. When delays between auditory stimuli and footshock are introduced, norepinephrine acts as a prediction error signal. However, temporal difference prediction errors do not fully explain norepinephrine dynamics. To explain noradrenergic signaling, we used an updated reinforcement learning model with uncertainty about time and found that it explained norepinephrine dynamics across learning and variations in temporal and auditory task structure. CONCLUSIONS Norepinephrine thus combines cognitive and affective information into a predictive signal and links time with the anticipation of danger.
Collapse
Affiliation(s)
- Aakash Basu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut; Interdepartmental Neuroscience Program, Yale School of Medicine, New Haven, Connecticut
| | - Jen-Hau Yang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | - Abigail Yu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | | | - Jocelyne A Rondeau
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
| | - Jiesi Feng
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing, China
| | - John H Krystal
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut; Clinical Neuroscience Division, Veterans Administration National Center for PTSD, West Haven, Connecticut
| | - Yulong Li
- State Key Laboratory of Membrane Biology, Peking University School of Life Sciences, Beijing, China; Peking University-IDG/McGovern Institute for Brain Research, Beijing, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China; Chinese Institute for Brain Research, Beijing, China
| | - Alfred P Kaye
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut; Clinical Neuroscience Division, Veterans Administration National Center for PTSD, West Haven, Connecticut; Wu Tsai Institute, Yale University, New Haven, Connecticut.
| |
Collapse
|
12
|
Lee RS, Sagiv Y, Engelhard B, Witten IB, Daw ND. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat Neurosci 2024; 27:1574-1586. [PMID: 38961229 DOI: 10.1038/s41593-024-01689-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 05/22/2024] [Indexed: 07/05/2024]
Abstract
The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error (RPE) is among the great successes of computational neuroscience. However, recent results contradict a core aspect of this theory: specifically that the neurons convey a scalar, homogeneous signal. While the predominant family of extensions to the RPE model replicates the classic model in multiple parallel circuits, we argue that these models are ill suited to explain reports of heterogeneity in task variable encoding across DA neurons. Instead, we introduce a complementary 'feature-specific RPE' model, positing that individual ventral tegmental area DA neurons report RPEs for different aspects of an animal's moment-to-moment situation. Further, we show how our framework can be extended to explain patterns of heterogeneity in action responses reported among substantia nigra pars compacta DA neurons. This theory reconciles new observations of DA heterogeneity with classic ideas about RPE coding while also providing a new perspective of how the brain performs reinforcement learning in high-dimensional environments.
Collapse
Affiliation(s)
- Rachel S Lee
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | - Yotam Sagiv
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | - Ben Engelhard
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | | | - Nathaniel D Daw
- Princeton Neuroscience Institute, Princeton, NJ, USA.
- Department of Psychology, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
13
|
Cone I, Clopath C, Shouval HZ. Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time. Nat Commun 2024; 15:5856. [PMID: 38997276 PMCID: PMC11245539 DOI: 10.1038/s41467-024-50205-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 07/02/2024] [Indexed: 07/14/2024] Open
Abstract
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference learning (TD) learning, whereby certain units signal reward prediction errors (RPE). The TD algorithm has been traditionally mapped onto the dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, certain predictions of TD learning are inconsistent with experimental results, and previous implementations of the algorithm have made unscalable assumptions regarding stimulus-specific fixed temporal bases. We propose an alternate framework to describe dopamine signaling in the brain, FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, dopamine release is similar, but not identical to RPE, leading to predictions that contrast to those of TD. While FLEX itself is a general theoretical framework, we describe a specific, biophysically plausible implementation, the results of which are consistent with a preponderance of both existing and reanalyzed experimental data.
Collapse
Affiliation(s)
- Ian Cone
- Department of Bioengineering, Imperial College London, London, UK
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA
- Applied Physics Program, Rice University, Houston, TX, USA
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, London, UK
| | - Harel Z Shouval
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA.
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.
| |
Collapse
|
14
|
Augustat N, Endres D, Mueller EM. Uncertainty of treatment efficacy moderates placebo effects on reinforcement learning. Sci Rep 2024; 14:14421. [PMID: 38909105 PMCID: PMC11193823 DOI: 10.1038/s41598-024-64240-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/06/2024] [Indexed: 06/24/2024] Open
Abstract
The placebo-reward hypothesis postulates that positive effects of treatment expectations on health (i.e., placebo effects) and reward processing share common neural underpinnings. Moreover, experiments in humans and animals indicate that reward uncertainty increases striatal dopamine, which is presumably involved in placebo responses and reward learning. Therefore, treatment uncertainty analogously to reward uncertainty may affect updating from rewards after placebo treatment. Here, we address whether different degrees of uncertainty regarding the efficacy of a sham treatment affect reward sensitivity. In an online between-subjects experiment with N = 141 participants, we systematically varied the provided efficacy instructions before participants first received a sham treatment that consisted of listening to binaural beats and then performed a probabilistic reinforcement learning task. We fitted a Q-learning model including two different learning rates for positive (gain) and negative (loss) reward prediction errors and an inverse gain parameter to behavioral decision data in the reinforcement learning task. Our results yielded an inverted-U-relationship between provided treatment efficacy probability and learning rates for gain, such that higher levels of treatment uncertainty, rather than of expected net efficacy, affect presumably dopamine-related reward learning. These findings support the placebo-reward hypothesis and suggest harnessing uncertainty in placebo treatment for recovering reward learning capabilities.
Collapse
Affiliation(s)
- Nick Augustat
- Department of Psychology, University of Marburg, Marburg, Germany.
| | - Dominik Endres
- Department of Psychology, University of Marburg, Marburg, Germany
| | - Erik M Mueller
- Department of Psychology, University of Marburg, Marburg, Germany
| |
Collapse
|
15
|
Bornhoft KN, Prohofsky J, O'Neal TJ, Wolff AR, Saunders BT. Valence ambiguity dynamically shapes striatal dopamine heterogeneity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594692. [PMID: 38798567 PMCID: PMC11118546 DOI: 10.1101/2024.05.17.594692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Adaptive decision making relies on dynamic updating of learned associations where environmental cues come to predict positive and negatively valenced stimuli, such as food or threat. Flexible cue-guided behaviors depend on a network of brain systems, including dopamine signaling in the striatum, which is critical for learning and maintenance of conditioned behaviors. Critically, it remains unclear how dopamine signaling encodes multi-valent, dynamic learning contexts, where positive and negative associations must be rapidly disambiguated. To understand this, we employed a Pavlovian discrimination paradigm, where cues predicting positive and negative outcomes were intermingled during conditioning sessions, and their meaning was serially reversed across training. We found that rats readily distinguished these cues, and updated their behavior rapidly upon valence reversal. Using fiber photometry, we recorded dopamine signaling in three major striatal subregions -,the dorsolateral striatum (DLS), the nucleus accumbens core, and the nucleus accumbens medial shell - and found heterogeneous responses to positive and negative conditioned cues and their predicted outcomes. Valence ambiguity introduced by cue reversal reshaped striatal dopamine on different timelines: nucleus accumbens core and shell signals updated more readily than those in the DLS. Together, these results suggest that striatal dopamine flexibly encodes multi-valent learning contexts, and these signals are dynamically modulated by changing contingencies to resolve ambiguity about the meaning of environmental cues.
Collapse
|
16
|
Schultz W. A dopamine mechanism for reward maximization. Proc Natl Acad Sci U S A 2024; 121:e2316658121. [PMID: 38717856 PMCID: PMC11098095 DOI: 10.1073/pnas.2316658121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024] Open
Abstract
Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.
Collapse
Affiliation(s)
- Wolfram Schultz
- Department of Physiology, Development and Neuroscience, University of Cambridge, CambridgeCB2 3DY, United Kingdom
| |
Collapse
|
17
|
Floeder JR, Jeong H, Mohebi A, Namboodiri VMK. Mesolimbic dopamine ramps reflect environmental timescales. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587103. [PMID: 38659749 PMCID: PMC11042231 DOI: 10.1101/2024.03.27.587103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Mesolimbic dopamine activity occasionally exhibits ramping dynamics, reigniting debate on theories of dopamine signaling. This debate is ongoing partly because the experimental conditions under which dopamine ramps emerge remain poorly understood. Here, we show that during Pavlovian and instrumental conditioning, mesolimbic dopamine ramps are only observed when the inter-trial interval is short relative to the trial period. These results constrain theories of dopamine signaling and identify a critical variable determining the emergence of dopamine ramps.
Collapse
Affiliation(s)
- Joseph R Floeder
- Neuroscience Graduate Program, University of California, San Francisco, CA, USA
| | - Huijeong Jeong
- Department of Neurology, University of California, San Francisco, CA, USA
| | - Ali Mohebi
- Department of Neurology, University of California, San Francisco, CA, USA
| | - Vijay Mohan K Namboodiri
- Neuroscience Graduate Program, University of California, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, Kavli Institute for Fundamental Neuroscience, Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| |
Collapse
|
18
|
Cowan RL, Davis T, Kundu B, Rahimpour S, Rolston JD, Smith EH. More widespread and rigid neuronal representation of reward expectation underlies impulsive choices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.588637. [PMID: 38645037 PMCID: PMC11030340 DOI: 10.1101/2024.04.11.588637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Impulsive choices prioritize smaller, more immediate rewards over larger, delayed, or potentially uncertain rewards. Impulsive choices are a critical aspect of substance use disorders and maladaptive decision-making across the lifespan. Here, we sought to understand the neuronal underpinnings of expected reward and risk estimation on a trial-by-trial basis during impulsive choices. To do so, we acquired electrical recordings from the human brain while participants carried out a risky decision-making task designed to measure choice impulsivity. Behaviorally, we found a reward-accuracy tradeoff, whereby more impulsive choosers were more accurate at the task, opting for a more immediate reward while compromising overall task performance. We then examined how neuronal populations across frontal, temporal, and limbic brain regions parametrically encoded reinforcement learning model variables, namely reward and risk expectation and surprise, across trials. We found more widespread representations of reward value expectation and prediction error in more impulsive choosers, whereas less impulsive choosers preferentially represented risk expectation. A regional analysis of reward and risk encoding highlighted the anterior cingulate cortex for value expectation, the anterior insula for risk expectation and surprise, and distinct regional encoding between impulsivity groups. Beyond describing trial-by-trial population neuronal representations of reward and risk variables, these results suggest impaired inhibitory control and model-free learning underpinnings of impulsive choice. These findings shed light on neural processes underlying reinforced learning and decision-making in uncertain environments and how these processes may function in psychiatric disorders.
Collapse
Affiliation(s)
- Rhiannon L Cowan
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| | - Tyler Davis
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| | - Bornali Kundu
- Department of Neurosurgery, University of Missouri, Columbia, MO 65212, USA
| | - Shervin Rahimpour
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| | - John D Rolston
- Department of Neurosurgery, Brigham & Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Elliot H Smith
- Department of Neurosurgery, University of Utah, Salt Lake City, UT 84132, USA
| |
Collapse
|
19
|
Holly EN, Galanaugh J, Fuccillo MV. Local regulation of striatal dopamine: A diversity of circuit mechanisms for a diversity of behavioral functions? Curr Opin Neurobiol 2024; 85:102839. [PMID: 38309106 PMCID: PMC11066854 DOI: 10.1016/j.conb.2024.102839] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/05/2024]
Abstract
Striatal dopamine governs a wide range of behavioral functions, yet local dopamine concentrations can be dissociated from somatic activity. Here, we discuss how dopamine's diverse roles in behavior may be driven by local circuit mechanisms shaping dopamine release. We first look at historical and recent work demonstrating that striatal circuits interact with dopaminergic terminals to either initiate the release of dopamine or modulate the release of dopamine initiated by spiking in midbrain dopamine neurons, with particular attention to GABAergic and cholinergic local circuit mechanisms. Then we discuss some of the first in vivo studies of acetylcholine-dopamine interactions in striatum and broadly discuss necessary future work in understanding the roles of midbrain versus striatal dopamine regulation.
Collapse
Affiliation(s)
- Elizabeth N Holly
- Center for Molecular and Behavioral Neuroscience, Rutgers University, 197 University Ave, Newark, NJ 07102, USA. https://twitter.com/ENHolly
| | - Jamie Galanaugh
- Neuroscience Graduate Group, Perelman School of Medicine at the University of Pennsylvania, 415 Curie Blvd, Philadelphia, PA 19104, USA. https://twitter.com/jamie_galanaugh
| | - Marc V Fuccillo
- Department of Neuroscience, Perelman School of Medicine at the University of Pennsylvania, 415 Curie Blvd, Philadelphia, PA 19104, USA.
| |
Collapse
|
20
|
Amo R. Prediction error in dopamine neurons during associative learning. Neurosci Res 2024; 199:12-20. [PMID: 37451506 DOI: 10.1016/j.neures.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 06/18/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
Dopamine neurons have long been thought to facilitate learning by broadcasting reward prediction error (RPE), a teaching signal used in machine learning, but more recent work has advanced alternative models of dopamine's computational role. Here, I revisit this critical issue and review new experimental evidences that tighten the link between dopamine activity and RPE. First, I introduce the recent observation of a gradual backward shift of dopamine activity that had eluded researchers for over a decade. I also discuss several other findings, such as dopamine ramping, that were initially interpreted to conflict but later found to be consistent with RPE. These findings improve our understanding of neural computation in dopamine neurons.
Collapse
Affiliation(s)
- Ryunosuke Amo
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
21
|
Hassall CD, Yan Y, Hunt LT. The neural correlates of continuous feedback processing. Psychophysiology 2023; 60:e14399. [PMID: 37485986 PMCID: PMC10851313 DOI: 10.1111/psyp.14399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 07/12/2023] [Accepted: 07/12/2023] [Indexed: 07/25/2023]
Abstract
Feedback processing is commonly studied by analyzing the brain's response to discrete rather than continuous events. Such studies have led to the hypothesis that rapid phasic midbrain dopaminergic activity tracks reward prediction errors (RPEs), the effects of which are measurable at the scalp via electroencephalography (EEG). Although studies using continuous feedback are sparse, recent animal work suggests that moment-to-moment changes in reward are tracked by slowly ramping midbrain dopaminergic activity. Some have argued that these ramping signals index state values rather than RPEs. Our goal here was to develop an EEG measure of continuous feedback processing in humans, then test whether its behavior could be accounted for by the RPE hypothesis. Participants completed a stimulus-response learning task in which a continuous reward cue gradually increased or decreased over time. A regression-based unmixing approach revealed EEG activity with a topography and time course consistent with the stimulus-preceding negativity (SPN), a scalp potential previously linked to reward anticipation and tonic dopamine release. Importantly, this reward-related activity depended on outcome expectancy: as predicted by the RPE hypothesis, activity for expected reward cues was reduced compared to unexpected reward cues. These results demonstrate the possibility of using human scalp-recorded potentials to track continuous feedback processing, and test candidate hypotheses of this activity.
Collapse
Affiliation(s)
- Cameron D. Hassall
- Department of PsychiatryUniversity of OxfordOxfordUK
- Department of PsychologyMacEwan UniversityEdmontonAlbertaCanada
| | - Yan Yan
- Department of PsychiatryUniversity of OxfordOxfordUK
- Department of PsychologyStanford UniversityStanfordCaliforniaUSA
| | - Laurence T. Hunt
- Department of PsychiatryUniversity of OxfordOxfordUK
- Department of Experimental PsychologyUniversity of OxfordOxfordUK
| |
Collapse
|
22
|
Masset P, Tano P, Kim HR, Malik AN, Pouget A, Uchida N. Multi-timescale reinforcement learning in the brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.12.566754. [PMID: 38014166 PMCID: PMC10680596 DOI: 10.1101/2023.11.12.566754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-6 and at characterizing the firing of dopamine neurons in the midbrain7-9. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations10-14, and open new avenues for the design of more efficient reinforcement learning algorithms.
Collapse
Affiliation(s)
- Paul Masset
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
| | - Pablo Tano
- Department of Basic Neuroscience, University of Geneva, Switzerland
| | - HyungGoo R Kim
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
- Department of Biomedical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
- Center for Neuroscience Imaging Research, Institute for Basic Science (IBS), Suwon 16419, Republic of Korea
| | - Athar N Malik
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
- Department of Neurosurgery, Warren Alpert Medical School of Brown University, USA
- Norman Prince Neurosciences Institute, Rhode Island Hospital, USA
| | - Alexandre Pouget
- Department of Basic Neuroscience, University of Geneva, Switzerland
| | - Naoshige Uchida
- Department of Molecular and Cellular Biology, Harvard University, USA
- Center for Brain Science, Harvard University, USA
| |
Collapse
|
23
|
Krausz TA, Comrie AE, Kahn AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. Neuron 2023; 111:3465-3478.e7. [PMID: 37611585 PMCID: PMC10841332 DOI: 10.1016/j.neuron.2023.07.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 08/25/2023]
Abstract
Animals frequently make decisions based on expectations of future reward ("values"). Values are updated by ongoing experience: places and choices that result in reward are assigned greater value. Yet, the specific algorithms used by the brain for such credit assignment remain unclear. We monitored accumbens dopamine as rats foraged for rewards in a complex, changing environment. We observed brief dopamine pulses both at reward receipt (scaling with prediction error) and at novel path opportunities. Dopamine also ramped up as rats ran toward reward ports, in proportion to the value at each location. By examining the evolution of these dopamine place-value signals, we found evidence for two distinct update processes: progressive propagation of value along taken paths, as in temporal difference learning, and inference of value throughout the maze, using internal models. Our results demonstrate that within rich, naturalistic environments dopamine conveys place values that are updated via multiple, complementary learning algorithms.
Collapse
Affiliation(s)
- Timothy A Krausz
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alison E Comrie
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ari E Kahn
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, Princeton, Princeton, NJ 08544, USA
| | - Loren M Frank
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA; Department of Physiology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Nathaniel D Daw
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, Princeton, Princeton, NJ 08544, USA
| | - Joshua D Berke
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA; Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Neurology and Department of Psychiatry and Behavioral Science, University of California, San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
24
|
Bech P, Crochet S, Dard R, Ghaderi P, Liu Y, Malekzadeh M, Petersen CCH, Pulin M, Renard A, Sourmpis C. Striatal Dopamine Signals and Reward Learning. FUNCTION 2023; 4:zqad056. [PMID: 37841525 PMCID: PMC10572094 DOI: 10.1093/function/zqad056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 10/17/2023] Open
Abstract
We are constantly bombarded by sensory information and constantly making decisions on how to act. In order to optimally adapt behavior, we must judge which sequences of sensory inputs and actions lead to successful outcomes in specific circumstances. Neuronal circuits of the basal ganglia have been strongly implicated in action selection, as well as the learning and execution of goal-directed behaviors, with accumulating evidence supporting the hypothesis that midbrain dopamine neurons might encode a reward signal useful for learning. Here, we review evidence suggesting that midbrain dopaminergic neurons signal reward prediction error, driving synaptic plasticity in the striatum underlying learning. We focus on phasic increases in action potential firing of midbrain dopamine neurons in response to unexpected rewards. These dopamine neurons prominently innervate the dorsal and ventral striatum. In the striatum, the released dopamine binds to dopamine receptors, where it regulates the plasticity of glutamatergic synapses. The increase of striatal dopamine accompanying an unexpected reward activates dopamine type 1 receptors (D1Rs) initiating a signaling cascade that promotes long-term potentiation of recently active glutamatergic input onto striatonigral neurons. Sensorimotor-evoked glutamatergic input, which is active immediately before reward delivery will thus be strengthened onto neurons in the striatum expressing D1Rs. In turn, these neurons cause disinhibition of brainstem motor centers and disinhibition of the motor thalamus, thus promoting motor output to reinforce rewarded stimulus-action outcomes. Although many details of the hypothesis need further investigation, altogether, it seems likely that dopamine signals in the striatum might underlie important aspects of goal-directed reward-based learning.
Collapse
Affiliation(s)
- Pol Bech
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Sylvain Crochet
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Robin Dard
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Parviz Ghaderi
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Yanqi Liu
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Meriam Malekzadeh
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Carl C H Petersen
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Mauro Pulin
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Anthony Renard
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| | - Christos Sourmpis
- Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
| |
Collapse
|
25
|
Cone I, Clopath C, Shouval HZ. Learning to Express Reward Prediction Error-like Dopaminergic Activity Requires Plastic Representations of Time. RESEARCH SQUARE 2023:rs.3.rs-3289985. [PMID: 37790466 PMCID: PMC10543312 DOI: 10.21203/rs.3.rs-3289985/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference (TD) reinforcement learning. The TD framework predicts that some neuronal elements should represent the reward prediction error (RPE), which means they signal the difference between the expected future rewards and the actual rewards. The prominence of the TD theory arises from the observation that firing properties of dopaminergic neurons in the ventral tegmental area appear similar to those of RPE model-neurons in TD learning. Previous implementations of TD learning assume a fixed temporal basis for each stimulus that might eventually predict a reward. Here we show that such a fixed temporal basis is implausible and that certain predictions of TD learning are inconsistent with experiments. We propose instead an alternative theoretical framework, coined FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, feature specific representations of time are learned, allowing for neural representations of stimuli to adjust their timing and relation to rewards in an online manner. In FLEX dopamine acts as an instructive signal which helps build temporal models of the environment. FLEX is a general theoretical framework that has many possible biophysical implementations. In order to show that FLEX is a feasible approach, we present a specific biophysically plausible model which implements the principles of FLEX. We show that this implementation can account for various reinforcement learning paradigms, and that its results and predictions are consistent with a preponderance of both existing and reanalyzed experimental data.
Collapse
Affiliation(s)
- Ian Cone
- Department of Bioengineering, Imperial College London, London, United Kingdom
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX
- Applied Physics Program, Rice University, Houston, TX
| | - Claudia Clopath
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Harel Z Shouval
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX
- Department of Electrical and Computer Engineering, Rice University, Houston, TX
| |
Collapse
|
26
|
Hennig JA, Romero Pinto SA, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. PLoS Comput Biol 2023; 19:e1011067. [PMID: 37695776 PMCID: PMC10513382 DOI: 10.1371/journal.pcbi.1011067] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 09/21/2023] [Accepted: 08/27/2023] [Indexed: 09/13/2023] Open
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Future Research Department, Toyota Research Institute of North America, Toyota Motor North America, Ann Arbor, Michigan, United States of America
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America
- Center for Brain Science, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
27
|
Tichelaar JG, Sayalı C, Helmich RC, Cools R. Impulse control disorder in Parkinson's disease is associated with abnormal frontal value signalling. Brain 2023; 146:3676-3689. [PMID: 37192341 PMCID: PMC10473575 DOI: 10.1093/brain/awad162] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 04/18/2023] [Accepted: 04/26/2023] [Indexed: 05/18/2023] Open
Abstract
Dopaminergic medication is well established to boost reward- versus punishment-based learning in Parkinson's disease. However, there is tremendous variability in dopaminergic medication effects across different individuals, with some patients exhibiting much greater cognitive sensitivity to medication than others. We aimed to unravel the mechanisms underlying this individual variability in a large heterogeneous sample of early-stage patients with Parkinson's disease as a function of comorbid neuropsychiatric symptomatology, in particular impulse control disorders and depression. One hundred and ninety-nine patients with Parkinson's disease (138 ON medication and 61 OFF medication) and 59 healthy controls were scanned with functional MRI while they performed an established probabilistic instrumental learning task. Reinforcement learning model-based analyses revealed medication group differences in learning from gains versus losses, but only in patients with impulse control disorders. Furthermore, expected-value related brain signalling in the ventromedial prefrontal cortex was increased in patients with impulse control disorders ON medication compared with those OFF medication, while striatal reward prediction error signalling remained unaltered. These data substantiate the hypothesis that dopamine's effects on reinforcement learning in Parkinson's disease vary with individual differences in comorbid impulse control disorder and suggest they reflect deficient computation of value in medial frontal cortex, rather than deficient reward prediction error signalling in striatum. See Michael Browning (https://doi.org/10.1093/brain/awad248) for a scientific commentary on this article.
Collapse
Affiliation(s)
- Jorryt G Tichelaar
- Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, 6525EN Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Neurology, Centre of Expertise for Parkinson and Movement Disorders, 6525GA Nijmegen, The Netherlands
| | - Ceyda Sayalı
- The Johns Hopkins University School of Medicine, Center for Psychedelic and Consciousness Research, Baltimore, MD 21224, USA
| | - Rick C Helmich
- Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, 6525EN Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Neurology, Centre of Expertise for Parkinson and Movement Disorders, 6525GA Nijmegen, The Netherlands
| | - Roshan Cools
- Radboud University Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, 6525EN Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Psychiatry, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
28
|
Li Y, Wang X, Xie X, Liu Q, Dong H, Hou Y, Xia Q, Zhao P. Enhanced locomotor behaviour is mediated by activation of tyrosine hydroxylase in the silkworm brain. INSECT MOLECULAR BIOLOGY 2023; 32:251-262. [PMID: 36636859 DOI: 10.1111/imb.12828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/27/2022] [Indexed: 05/15/2023]
Abstract
Animal behaviour regulation is a complex process involving many factors, and the nervous system is an essential factor in this process. In many species, pathogens can alter host behaviour by affecting the host's nervous system. An interesting example is that the silkworm shows enhanced locomotor behaviour after being infected with the nucleopolyhedrosis virus. In this study, we analysed the transcriptome of the silkworm brain at different time points after infection and found that various genes related to behaviour regulation changed after infection. In-depth analysis showed that the tyrosine hydroxylase gene might be a key candidate gene, and the content of dopamine, its downstream metabolite, increased significantly in the brain of silkworms infected with the virus. After the injection of tyrosine hydroxylase inhibitor into the infected silkworm, the dopamine content in the silkworm brain decreased and the locomotor behaviour caused by the virus was blocked successfully. These results confirm that tyrosine hydroxylase is involved in regulating enhanced locomotor behaviour after virus infection in silkworms. Furthermore, the tyrosine hydroxylase gene was specifically overexpressed in the brain of the silkworm, and the transgenic silkworm was enhanced in locomotor behaviour and foraging behaviour. These results suggest that the tyrosine hydroxylase gene plays a vital role in regulating insect behaviour.
Collapse
Affiliation(s)
- Yi Li
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Xin Wang
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Xiaoqian Xie
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Qingsong Liu
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Haonan Dong
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Yong Hou
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Qingyou Xia
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| | - Ping Zhao
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China
- Biological Science Research Center, Southwest University, Chongqing, China
| |
Collapse
|
29
|
Alexander WH, Deraeve J, Vassena E. Dissociation and integration of outcome and state uncertainty signals in cognitive control. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2023:10.3758/s13415-023-01091-7. [PMID: 37058212 PMCID: PMC10390360 DOI: 10.3758/s13415-023-01091-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 04/15/2023]
Abstract
Signals related to uncertainty are frequently observed in regions of the cognitive control network, including anterior cingulate/medial prefrontal cortex (ACC/mPFC), dorsolateral prefrontal cortex (dlPFC), and anterior insular cortex. Uncertainty generally refers to conditions in which decision variables may assume multiple possible values and can arise at multiple points in the perception-action cycle, including sensory input, inferred states of the environment, and the consequences of actions. These sources of uncertainty are frequently correlated: noisy input can lead to unreliable estimates of the state of the environment, with consequential influences on action selection. Given this correlation amongst various sources of uncertainty, dissociating the neural structures underlying their estimation presents an ongoing issue: a region associated with uncertainty related to outcomes may estimate outcome uncertainty itself, or it may reflect a cascade effect of state uncertainty on outcome estimates. In this study, we derive signals of state and outcome uncertainty from mathematical models of risk and observe regions in the cognitive control network whose activity is best explained by signals related to state uncertainty (anterior insula), outcome uncertainty (dlPFC), as well as regions that appear to integrate the two (ACC/mPFC).
Collapse
Affiliation(s)
- William H Alexander
- Center for Complex Systems & Brain Sciences, Florida Atlantic University, Boca Raton, FL, USA.
- Department of Psychology, Florida Atlantic University, Boca Raton, FL, USA.
- The Brain Institute, Florida Atlantic University, Boca Raton, FL, USA.
- Department of Experimental Psychology, Ghent University, Ghent, Belgium.
| | - James Deraeve
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Eliana Vassena
- Experimental Psychopathology and Treatment, Behavioural Science Institute, Radboud University, Nijmegen, Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, Netherlands
| |
Collapse
|
30
|
Hennig JA, Pinto SAR, Yamaguchi T, Linderman SW, Uchida N, Gershman SJ. Emergence of belief-like representations through reinforcement learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535512. [PMID: 37066383 PMCID: PMC10104054 DOI: 10.1101/2023.04.04.535512] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity. Author Summary Natural environments are full of uncertainty. For example, just because my fridge had food in it yesterday does not mean it will have food today. Despite such uncertainty, animals can estimate which states and actions are the most valuable. Previous work suggests that animals estimate value using a brain area called the basal ganglia, using a process resembling a reinforcement learning algorithm called TD learning. However, traditional reinforcement learning algorithms cannot accurately estimate value in environments with state uncertainty (e.g., when my fridge's contents are unknown). One way around this problem is if agents form "beliefs," a probabilistic estimate of how likely each state is, given any observations so far. However, estimating beliefs is a demanding process that may not be possible for animals in more complex environments. Here we show that an artificial recurrent neural network (RNN) trained with TD learning can estimate value from observations, without explicitly estimating beliefs. The trained RNN's error signals resembled the neural activity of dopamine neurons measured during the same task. Importantly, the RNN's activity resembled beliefs, but only when the RNN had enough capacity. This work illustrates how animals could estimate value in uncertain environments without needing to first form beliefs, which may be useful in environments where computing the true beliefs is too costly.
Collapse
Affiliation(s)
- Jay A. Hennig
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Sandra A. Romero Pinto
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Takahiro Yamaguchi
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- Future Vehicle Research Department, Toyota Research Institute North America, Toyota Motor North America Inc., Ann Arbor, MI, USA
| | - Scott W. Linderman
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Naoshige Uchida
- Center for Brain Science, Harvard University, Cambridge, MA, USA
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Samuel J. Gershman
- Department of Psychology, Harvard University, Cambridge, MA, USA
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
31
|
Krausz TA, Comrie AE, Frank LM, Daw ND, Berke JD. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528738. [PMID: 36993482 PMCID: PMC10054934 DOI: 10.1101/2023.02.15.528738] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Dopamine in the nucleus accumbens helps motivate behavior based on expectations of future reward ("values"). These values need to be updated by experience: after receiving reward, the choices that led to reward should be assigned greater value. There are multiple theoretical proposals for how this credit assignment could be achieved, but the specific algorithms that generate updated dopamine signals remain uncertain. We monitored accumbens dopamine as freely behaving rats foraged for rewards in a complex, changing environment. We observed brief pulses of dopamine both when rats received reward (scaling with prediction error), and when they encountered novel path opportunities. Furthermore, dopamine ramped up as rats ran towards reward ports, in proportion to the value at each location. By examining the evolution of these dopamine place-value signals, we found evidence for two distinct update processes: progressive propagation along taken paths, as in temporal-difference learning, and inference of value throughout the maze, using internal models. Our results demonstrate that within rich, naturalistic environments dopamine conveys place values that are updated via multiple, complementary learning algorithms.
Collapse
Affiliation(s)
- Timothy A Krausz
- Neuroscience Graduate Program, University of California, San Francisco
| | - Alison E Comrie
- Neuroscience Graduate Program, University of California, San Francisco
| | - Loren M Frank
- Neuroscience Graduate Program, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, UCSF
- Howard Hughes Medical Institute
- Department of Physiology, UCSF
| | - Nathaniel D Daw
- Department of Psychology, and Princeton Neuroscience Institute, Princeton University, NJ
| | - Joshua D Berke
- Neuroscience Graduate Program, University of California, San Francisco
- Kavli Institute for Fundamental Neuroscience, and Weill Institute for Neurosciences, UCSF
- Department of Neurology, and Department of Psychiatry and Behavioral Science, UCSF
| |
Collapse
|
32
|
Morita K, Shimomura K, Kawaguchi Y. Opponent Learning with Different Representations in the Cortico-Basal Ganglia Circuits. eNeuro 2023; 10:ENEURO.0422-22.2023. [PMID: 36653187 PMCID: PMC9884109 DOI: 10.1523/eneuro.0422-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/06/2022] [Accepted: 01/03/2023] [Indexed: 01/20/2023] Open
Abstract
The direct and indirect pathways of the basal ganglia (BG) have been suggested to learn mainly from positive and negative feedbacks, respectively. Since these pathways unevenly receive inputs from different cortical neuron types and/or regions, they may preferentially use different state/action representations. We explored whether such a combined use of different representations, coupled with different learning rates from positive and negative reward prediction errors (RPEs), has computational benefits. We modeled animal as an agent equipped with two learning systems, each of which adopted individual representation (IR) or successor representation (SR) of states. With varying the combination of IR or SR and also the learning rates from positive and negative RPEs in each system, we examined how the agent performed in a dynamic reward navigation task. We found that combination of SR-based system learning mainly from positive RPEs and IR-based system learning mainly from negative RPEs could achieve a good performance in the task, as compared with other combinations. In such a combination of appetitive SR-based and aversive IR-based systems, both systems show activities of comparable magnitudes with opposite signs, consistent with the suggested profiles of the two BG pathways. Moreover, the architecture of such a combination provides a novel coherent explanation for the functional significance and underlying mechanism of diverse findings about the cortico-BG circuits. These results suggest that particularly combining different representations with appetitive and aversive learning could be an effective learning strategy in certain dynamic environments, and it might actually be implemented in the cortico-BG circuits.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan
- International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo 113-0033, Japan
| | - Kanji Shimomura
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo 113-0033, Japan
- Department of Behavioral Medicine, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira 187-8551, Japan
| | - Yasuo Kawaguchi
- Brain Science Institute, Tamagawa University, Machida 194-8610, Japan
- National Institute for Physiological Sciences (NIPS), Okazaki 444-8787, Japan
| |
Collapse
|
33
|
Abstract
Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions1-3. Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction, respectively. In animals, behavioural learning and the role of mesolimbic dopamine signalling have been extensively evaluated with respect to reward prediction4; however, so far there has been little consideration of how direct policy learning might inform our understanding5. Here we used a comprehensive dataset of orofacial and body movements to understand how behavioural policies evolved as naive, head-restrained mice learned a trace conditioning paradigm. Individual differences in initial dopaminergic reward responses correlated with the emergence of learned behavioural policy, but not the emergence of putative value encoding for a predictive cue. Likewise, physiologically calibrated manipulations of mesolimbic dopamine produced several effects inconsistent with value learning but predicted by a neural-network-based model that used dopamine signals to set an adaptive rate, not an error signal, for behavioural policy learning. This work provides strong evidence that phasic dopamine activity can regulate direct learning of behavioural policies, expanding the explanatory power of reinforcement learning models for animal learning6.
Collapse
|
34
|
Jakob AMV, Mikhael JG, Hamilos AE, Assad JA, Gershman SJ. Dopamine mediates the bidirectional update of interval timing. Behav Neurosci 2022; 136:445-452. [PMID: 36222637 PMCID: PMC9725808 DOI: 10.1037/bne0000529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
The role of dopamine (DA) as a reward prediction error (RPE) signal in reinforcement learning (RL) tasks has been well-established over the past decades. Recent work has shown that the RPE interpretation can also account for the effects of DA on interval timing by controlling the speed of subjective time. According to this theory, the timing of the dopamine signal relative to reward delivery dictates whether subjective time speeds up or slows down: Early DA signals speed up subjective time and late signals slow it down. To test this bidirectional prediction, we reanalyzed measurements of dopaminergic neurons in the substantia nigra pars compacta of mice performing a self-timed movement task. Using the slope of ramping dopamine activity as a readout of subjective time speed, we found that trial-by-trial changes in the slope could be predicted from the timing of dopamine activity on the previous trial. This result provides a key piece of evidence supporting a unified computational theory of RL and interval timing. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Anthony M V Jakob
- Section of Life Sciences Engineering, École Polytechnique Fédérale de Lausanne
| | | | | | - John A Assad
- Department of Neurobiology, Harvard Medical School
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University
| |
Collapse
|
35
|
van Elzelingen W, Goedhoop J, Warnaar P, Denys D, Arbab T, Willuhn I. A unidirectional but not uniform striatal landscape of dopamine signaling for motivational stimuli. Proc Natl Acad Sci U S A 2022; 119:e2117270119. [PMID: 35594399 PMCID: PMC9171911 DOI: 10.1073/pnas.2117270119] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 04/04/2022] [Indexed: 11/18/2022] Open
Abstract
Dopamine signals in the striatum are critical for motivated behavior. However, their regional specificity and precise information content are actively debated. Dopaminergic projections to the striatum are topographically organized. Thus, we quantified dopamine release in response to motivational stimuli and associated predictive cues in six principal striatal regions of unrestrained, behaving rats. Absolute signal size and its modulation by stimulus value and by subjective state of the animal were interregionally heterogeneous on a medial to lateral gradient. In contrast, dopamine-concentration direction of change was homogeneous across all regions: appetitive stimuli increased and aversive stimuli decreased dopamine concentration. Although cues predictive of such motivational stimuli acquired the same influence over dopamine homogeneously across all regions, dopamine-mediated prediction-error signals were restricted to the ventromedial, limbic striatum. Together, our findings demonstrate a nuanced striatal landscape of unidirectional but not uniform dopamine signals, topographically encoding distinct aspects of motivational stimuli and their prediction.
Collapse
Affiliation(s)
- Wouter van Elzelingen
- Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, 1105 BA Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Jessica Goedhoop
- Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, 1105 BA Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Pascal Warnaar
- Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, 1105 BA Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Damiaan Denys
- Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, 1105 BA Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Tara Arbab
- Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, 1105 BA Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Ingo Willuhn
- Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, 1105 BA Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| |
Collapse
|
36
|
Whittington JC, Behrens TE. Reinforcement learning: Dopamine ramps with fuzzy value estimates. Curr Biol 2022; 32:R213-R215. [DOI: 10.1016/j.cub.2022.01.070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
37
|
Morita K, Kato A. Dopamine ramps for accurate value learning under uncertainty. Trends Neurosci 2022; 45:254-256. [PMID: 35181147 DOI: 10.1016/j.tins.2022.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 01/31/2022] [Indexed: 10/19/2022]
Abstract
Dopamine signals ramping towards reward timings have become widely reported, but their functions remain elusive. Through modeling analyses and experiments in mice, a recent study by Mikhael, Kim et al. shows that such signals represent reward prediction errors used for accurate value learning in conditions with uncertainty about upcoming state and its resolution by sensory feedback.
Collapse
Affiliation(s)
- Kenji Morita
- Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan; International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, Tokyo, Japan.
| | - Ayaka Kato
- Laboratory for Circuit Mechanisms of Sensory Perception, RIKEN Center for Brain Science, Wako, Japan; Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|