1
|
Medina-Coss y León R, Lezama E, Márquez I, Treviño M. Adrenergic Modulation of Cortical Gain and Sensory Processing in the Mouse Visual Cortex. Brain Sci 2025; 15:406. [PMID: 40309887 PMCID: PMC12025498 DOI: 10.3390/brainsci15040406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2025] [Revised: 04/10/2025] [Accepted: 04/16/2025] [Indexed: 05/02/2025] Open
Abstract
Background/Objectives: Sensory perception is influenced by internal neuronal variability and external noise. Neuromodulators such as norepinephrine (NE) regulate this variability by modulating excitation-inhibition balance, oscillatory dynamics, and interlaminar connectivity. While NE is known to modulate cortical gain, it remains unclear how it shapes sensory processing under noisy conditions. This study investigates how adrenergic modulation affects signal-to-noise processing and perceptual decision-making in the primary visual cortex (V1) of mice exposed to varying levels of visual noise. Methods: We performed in vivo local field potential (LFP) recordings from layers 2/3 and 4 of V1 in sedated mice to assess the impact of visual noise and systemic administration of atomoxetine, a NE reuptake inhibitor, on cortical signal processing. In a separate group of freely moving mice, we used a two-alternative forced-choice to evaluate the behavioral effects of systemic and intracortical adrenergic manipulations on visual discrimination. Results: Moderate visual noise enhanced cortical signal processing and visual choices, consistent with stochastic resonance. High noise levels impaired both. Systemic atomoxetine administration flattened the cortical signal-to-noise ratio function, suggesting disrupted gain control. Behaviorally, clonidine impaired accuracy at moderate noise levels, while atomoxetine reduced discrimination performance and increased response variability. Intracortical NE infusions produced similar effects. Conclusions: Our findings demonstrate that NE regulates the balance between signal amplification and noise suppression in a noise- and context-dependent manner. These results extend existing models of neuromodulatory function by linking interlaminar communication and cortical variability to perceptual decision-making.
Collapse
Affiliation(s)
- Ricardo Medina-Coss y León
- Laboratorio de Plasticidad Cortical y Aprendizaje Perceptual, Instituto de Neurociencias, Universidad de Guadalajara, Guadalajara 44130, Jalisco, Mexico
- School of Medicine, Southern Illinois University, Carbondale, IL 62901, USA
| | - Elí Lezama
- Laboratorio de Plasticidad Cortical y Aprendizaje Perceptual, Instituto de Neurociencias, Universidad de Guadalajara, Guadalajara 44130, Jalisco, Mexico
| | - Inmaculada Márquez
- Laboratorio de Plasticidad Cortical y Aprendizaje Perceptual, Instituto de Neurociencias, Universidad de Guadalajara, Guadalajara 44130, Jalisco, Mexico
- Departamento de Ciencias Médicas y de la Vida, Centro Universitario de la Ciénega, Universidad de Guadalajara, Ocotlán 47820, Jalisco, Mexico
- Departamento de Psicología, Centro Universitario de la Ciénega, Universidad de Guadalajara, Ocotlán 47820, Jalisco, Mexico
| | - Mario Treviño
- Laboratorio de Plasticidad Cortical y Aprendizaje Perceptual, Instituto de Neurociencias, Universidad de Guadalajara, Guadalajara 44130, Jalisco, Mexico
| |
Collapse
|
2
|
Cheng Y, Magnard R, Langdon AJ, Lee D, Janak PH. Chronic ethanol exposure produces sex-dependent impairments in value computations in the striatum. SCIENCE ADVANCES 2025; 11:eadt0200. [PMID: 40173222 PMCID: PMC11963993 DOI: 10.1126/sciadv.adt0200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 02/27/2025] [Indexed: 04/04/2025]
Abstract
Value-based decision-making relies on the striatum, where neural plasticity can be altered by chronic ethanol (EtOH) exposure, but the effects of such plasticity on striatal neural dynamics during decision-making remain unclear. This study investigated the long-term impacts of EtOH on reward-driven decision-making and striatal neurocomputations in male and female rats using a dynamic probabilistic reversal learning task. Following a prolonged withdrawal period, EtOH-exposed male rats exhibited deficits in adaptability and exploratory behavior, with aberrant outcome-driven value updating that heightened preference for chosen action. These behavioral changes were linked to altered neural activity in the dorsomedial striatum (DMS), where EtOH increased outcome-related encoding and decreased choice-related encoding. In contrast, female rats showed minimal behavioral changes with distinct EtOH-evoked alterations of neural activity, revealing significant sex differences in the impact of chronic EtOH. Our findings underscore the impact of chronic EtOH exposure on adaptive decision-making, revealing enduring changes in neurocomputational processes in the striatum underlying cognitive deficits that differ by sex.
Collapse
Affiliation(s)
- Yifeng Cheng
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
| | - Robin Magnard
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Angela J. Langdon
- Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Daeyeol Lee
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
- Zanvyl Krieger Mind/Brain Institute, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Patricia H. Janak
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
3
|
Cheng Y, Magnard R, Langdon AJ, Lee D, Janak PH. Chronic Ethanol Exposure Produces Persistent Impairment in Cognitive Flexibility and Decision Signals in the Striatum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.10.584332. [PMID: 38585868 PMCID: PMC10996555 DOI: 10.1101/2024.03.10.584332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Lack of cognitive flexibility is a hallmark of substance use disorders and has been associated with drug-induced synaptic plasticity in the dorsomedial striatum (DMS). Yet the possible impact of altered plasticity on real-time striatal neural dynamics during decision-making is unclear. Here, we identified persistent impairments induced by chronic ethanol (EtOH) exposure on cognitive flexibility and striatal decision signals. After a substantial withdrawal period from prior EtOH vapor exposure, male, but not female, rats exhibited reduced adaptability and exploratory behavior during a dynamic decision-making task. Reinforcement learning models showed that prior EtOH exposure enhanced learning from rewards over omissions. Notably, neural signals in the DMS related to the decision outcome were enhanced, while those related to choice and choice-outcome conjunction were reduced, in EtOH-treated rats compared to the controls. These findings highlight the profound impact of chronic EtOH exposure on adaptive decision-making, pinpointing specific changes in striatal representations of actions and outcomes as underlying mechanisms for cognitive deficits.
Collapse
Affiliation(s)
- Yifeng Cheng
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD
| | - Robin Magnard
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD
| | - Angela J Langdon
- Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Bethesda, MD
| | - Daeyeol Lee
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD
- Zanvyl Krieger Mind/Brain Institute, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD
| | - Patricia H Janak
- Department Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
4
|
Iigaya K, Larsen T, Fong T, O'Doherty JP. Computational and Neural Evidence for Altered Fast and Slow Learning from Losses in Problem Gambling. J Neurosci 2025; 45:e0080242024. [PMID: 39557579 DOI: 10.1523/jneurosci.0080-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 09/27/2024] [Accepted: 10/29/2024] [Indexed: 11/20/2024] Open
Abstract
Learning occurs across multiple timescales, with fast learning crucial for adapting to sudden environmental changes, and slow learning beneficial for extracting robust knowledge from multiple events. Here, we asked if miscalibrated fast vs slow learning can lead to maladaptive decision-making in individuals with problem gambling. We recruited participants with problem gambling (PG; N = 20; 9 female and 11 male) and a recreational gambling control group without any symptoms associated with PG (N = 20; 10 female and 10 male) from the community in Los Angeles, CA. Participants performed a decision-making task involving reward-learning and loss-avoidance while being scanned with fMRI. Using computational model fitting, we found that individuals in the PG group showed evidence for an excessive dependence on slow timescales and a reduced reliance on fast timescales during learning. fMRI data implicated the putamen, an area associated with habit, and medial prefrontal cortex (PFC) in slow loss-value encoding, with significantly more robust encoding in medial PFC in the PG group compared to controls. The PG group also exhibited stronger loss prediction error encoding in the insular cortex. These findings suggest that individuals with PG have an impaired ability to adjust their predictions following losses, manifested by a stronger influence of slow value learning. This impairment could contribute to the behavioral inflexibility of problem gamblers, particularly the persistence in gambling behavior typically observed in those individuals after incurring loss outcomes.
Collapse
Affiliation(s)
- Kiyohito Iigaya
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, California 91125
- Department of Psychiatry, Columbia University Irving Medical Center, New York, New York 10032
- Center for Theoretical Neuroscience and Zuckerman Institute, Columbia University, New York, New York 10027
- New York State Psychiatric Institute, New York, New York 10032
| | - Tobias Larsen
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, California 91125
| | - Timothy Fong
- Semel Institute for Neuroscience and Human Behavior, UCLA, Los Angeles, California 90024
| | - John P O'Doherty
- Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, California 91125
| |
Collapse
|
5
|
Lavín C, García R, Fuentes M. Navigating Uncertainty: The Role of Mood and Confidence in Decision-Making Flexibility and Performance. Behav Sci (Basel) 2024; 14:1144. [PMID: 39767285 PMCID: PMC11673058 DOI: 10.3390/bs14121144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 11/21/2024] [Accepted: 11/22/2024] [Indexed: 01/11/2025] Open
Abstract
Dealing with uncertainty is a pivotal skill for adaptive decision-making across various real-life contexts. Cognitive models suggest that individuals continuously update their knowledge based on past choices and outcomes. Traditionally, uncertainty has been linked to negative states such as fear and anxiety. Recent evidence, however, highlights that uncertainty can also evoke positive emotions, such as surprise, interest, excitement, and enthusiasm, depending on one's task expectations. Despite this, the interplay between mood, confidence, and learning remains underexplored. Some studies indicate that self-reported mood does not always align with confidence, as these constructs evolve on different timescales. We propose that mood influences confidence, thereby enhancing decision flexibility-defined as the ability to switch effectively between exploration and exploitation. This increased flexibility is expected to improve task performance by increasing accuracy. Our findings support this hypothesis, revealing that confidence modulates exploration/exploitation strategies and learning rates, while mood affects reward perception and confidence levels. These findings indicate that metacognition entails a dynamic balance between exploration and exploitation, integrating mood states with high-level cognitive processes.
Collapse
Affiliation(s)
- Claudio Lavín
- Departamento de Psicología, Universidad Autónoma de Chile, Región Metropolitana, Santiago 7500912, Chile
| | - Roberto García
- Facultad de Psicología, Universidad Diego Portales, Región Metropolitana, Santiago 8320000, Chile
| | - Miguel Fuentes
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Instituto de Investigaciones Filosóficas—SADAF, Buenos Aires 1188, Argentina
- Instituto de Sistemas Complejos de Valparaíso, Artillería 470, Cerro Artillería, Valparaíso 2340000, Chile
| |
Collapse
|
6
|
Davidson AM, Hige T. Roles of feedback and feed-forward networks of dopamine subsystems: insights from Drosophila studies. Learn Mem 2024; 31:a053807. [PMID: 38862171 PMCID: PMC11199952 DOI: 10.1101/lm.053807.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/10/2023] [Indexed: 06/13/2024]
Abstract
Across animal species, dopamine-operated memory systems comprise anatomically segregated, functionally diverse subsystems. Although individual subsystems could operate independently to support distinct types of memory, the logical interplay between subsystems is expected to enable more complex memory processing by allowing existing memory to influence future learning. Recent comprehensive ultrastructural analysis of the Drosophila mushroom body revealed intricate networks interconnecting the dopamine subsystems-the mushroom body compartments. Here, we review the functions of some of these connections that are beginning to be understood. Memory consolidation is mediated by two different forms of network: A recurrent feedback loop within a compartment maintains sustained dopamine activity required for consolidation, whereas feed-forward connections across compartments allow short-term memory formation in one compartment to open the gate for long-term memory formation in another compartment. Extinction and reversal of aversive memory rely on a similar feed-forward circuit motif that signals omission of punishment as a reward, which triggers plasticity that counteracts the original aversive memory trace. Finally, indirect feed-forward connections from a long-term memory compartment to short-term memory compartments mediate higher-order conditioning. Collectively, these emerging studies indicate that feedback control and hierarchical connectivity allow the dopamine subsystems to work cooperatively to support diverse and complex forms of learning.
Collapse
Affiliation(s)
- Andrew M Davidson
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Integrative Program for Biological and Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Toshihide Hige
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Department of Cell Biology and Physiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
- Integrative Program for Biological and Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
7
|
Shahidi N, Franch M, Parajuli A, Schrater P, Wright A, Pitkow X, Dragoi V. Population coding of strategic variables during foraging in freely moving macaques. Nat Neurosci 2024; 27:772-781. [PMID: 38443701 PMCID: PMC11001579 DOI: 10.1038/s41593-024-01575-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 01/09/2024] [Indexed: 03/07/2024]
Abstract
Until now, it has been difficult to examine the neural bases of foraging in naturalistic environments because previous approaches have relied on restrained animals performing trial-based foraging tasks. Here we allowed unrestrained monkeys to freely interact with concurrent reward options while we wirelessly recorded population activity in the dorsolateral prefrontal cortex. The animals decided when and where to forage based on whether their prediction of reward was fulfilled or violated. This prediction was not solely based on a history of reward delivery, but also on the understanding that waiting longer improves the chance of reward. The task variables were continuously represented in a subspace of the high-dimensional population activity, and this compressed representation predicted the animal's subsequent choices better than the true task variables and as well as the raw neural activity. Our results indicate that monkeys' foraging strategies are based on a cortical model of reward dynamics as animals freely explore their environment.
Collapse
Affiliation(s)
- Neda Shahidi
- Department of Neurobiology and Anatomy, McGovern Medical School, University of Texas, Houston, Houston, TX, USA
- Georg-Elias-Müller-Institute for Psychology, Georg August-Universität, Göttingen, Germany
- Cognitive Neuroscience Laboratory, German Primate Center, Göttingen, Germany
| | - Melissa Franch
- Department of Neurobiology and Anatomy, McGovern Medical School, University of Texas, Houston, Houston, TX, USA
| | - Arun Parajuli
- Department of Neurobiology and Anatomy, McGovern Medical School, University of Texas, Houston, Houston, TX, USA
| | - Paul Schrater
- Department of Computer Science, University of Minnesota, Minneapolis, MN, USA
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Anthony Wright
- Department of Neurobiology and Anatomy, McGovern Medical School, University of Texas, Houston, Houston, TX, USA
| | - Xaq Pitkow
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA.
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Valentin Dragoi
- Department of Neurobiology and Anatomy, McGovern Medical School, University of Texas, Houston, Houston, TX, USA.
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.
- Neuroengineering Initiative, Rice University, Houston, TX, USA.
| |
Collapse
|
8
|
Mohebi A, Wei W, Pelattini L, Kim K, Berke JD. Dopamine transients follow a striatal gradient of reward time horizons. Nat Neurosci 2024; 27:737-746. [PMID: 38321294 PMCID: PMC11001583 DOI: 10.1038/s41593-023-01566-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 12/21/2023] [Indexed: 02/08/2024]
Abstract
Animals make predictions to guide their behavior and update those predictions through experience. Transient increases in dopamine (DA) are thought to be critical signals for updating predictions. However, it is unclear how this mechanism handles a wide range of behavioral timescales-from seconds or less (for example, if singing a song) to potentially hours or more (for example, if hunting for food). Here we report that DA transients in distinct rat striatal subregions convey prediction errors based on distinct time horizons. DA dynamics systematically accelerated from ventral to dorsomedial to dorsolateral striatum, in the tempo of spontaneous fluctuations, the temporal integration of prior rewards and the discounting of future rewards. This spectrum of timescales for evaluative computations can help achieve efficient learning and adaptive motivation for a broad range of behaviors.
Collapse
Affiliation(s)
- Ali Mohebi
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Wei Wei
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Lilian Pelattini
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Kyoungjun Kim
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Joshua D Berke
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California San Francisco, San Francisco, CA, USA.
- Neuroscience Graduate Program, University of California San Francisco, San Francisco, CA, USA.
- Kavli Institute for Fundamental Neuroscience, University of California San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
9
|
Van Allsburg J, Shahan TA. How do animals weigh conflicting information about reward sources over time? Comparing dynamic averaging models. Anim Cogn 2024; 27:11. [PMID: 38429608 PMCID: PMC10907467 DOI: 10.1007/s10071-024-01840-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/15/2023] [Accepted: 12/19/2023] [Indexed: 03/03/2024]
Abstract
Optimal foraging theory suggests that animals make decisions which maximize their food intake per unit time when foraging, but the mechanisms animals use to track the value of behavioral alternatives and choose between them remain unclear. Several models for how animals integrate past experience have been suggested. However, these models make differential predictions for the occurrence of spontaneous recovery of choice: a behavioral phenomenon in which a hiatus from the experimental environment results in animals reverting to a behavioral allocation consistent with a reward distribution from the more distant past, rather than one consistent with their most recently experienced distribution. To explore this phenomenon and compare these models, three free-operant experiments with rats were conducted using a serial reversal design. In Phase 1, two responses (A and B) were baited with pellets on concurrent variable interval schedules, favoring option A. In Phase 2, lever baiting was reversed to favor option B. Rats then entered a delay period, where they were maintained at weight in their home cages and no experimental sessions took place. Following this delay, preference was assessed using initial responding in test sessions where levers were presented, but not baited. Models were compared in performance, including an exponentially weighted moving average, the Temporal Weighting Rule, and variants of these models. While the data provided strong evidence of spontaneous recovery of choice, the form and extent of recovery was inconsistent with the models under investigation. Potential interpretations are discussed in relation to both the decision rule and valuation functions employed.
Collapse
Affiliation(s)
| | - Timothy A Shahan
- Department of Psychology, Utah State University, Logan, Utah, USA
| |
Collapse
|
10
|
Kubanek J. Matching provides efficient decisions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580481. [PMID: 38464109 PMCID: PMC10925186 DOI: 10.1101/2024.02.15.580481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
How humans and animals distribute their behavior across choice options has been of key interest to economics, psychology, ecology, and related fields. Neoclassical and behavioral economics have provided prescriptions for how decision-makers can maximize their reward or utility, but these formalisms are used by decision-makers rarely. Instead, individuals allocate their behavior in proportion to the worth of their options, a phenomenon captured by the generalized matching law. Why biological decision-makers adopt this strategy has been unclear. To provide insight into this issue, this article evaluates the performance of matching across a broad spectrum of decision situations, using simulations. Matching is found to attain a high or near-optimal gain, and the strategy achieves this level of performance following a single evaluation of the decision options. Thus, matching provides highly efficient decisions across a wide range of choice environments. This result offers a quantitative explanation for the broad adoption of matching by biological decision-makers.
Collapse
Affiliation(s)
- Jan Kubanek
- University of Utah, Salt Lake City, Utah, United States
| |
Collapse
|
11
|
Kubanek J. Matching provides efficient decisions. RESEARCH SQUARE 2024:rs.3.rs-3949086. [PMID: 38410437 PMCID: PMC10896367 DOI: 10.21203/rs.3.rs-3949086/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
How humans and animals distribute their behavior across choice options has been of key interest to economics, psychology, ecology, and related fields. Neoclassical and behavioral economics have provided prescriptions for how decision-makers can maximize their reward or utility, but these formalisms are used by decision-makers rarely. Instead, individuals allocate their behavior in proportion to the worth of their options, a phenomenon captured by the generalized matching law. Why biological decision-makers adopt this strategy has been unclear. To provide insight into this issue, this article evaluates the performance of matching across a broad spectrum of decision situations, using simulations. Matching is found to attain a high or near-optimal gain, and the strategy achieves this level of performance following a single evaluation of the decision options. Thus, matching provides highly efficient decisions across a wide range of choice environments. This result offers a quantitative explanation for the broad adoption of matching by biological decision-makers.
Collapse
Affiliation(s)
- Jan Kubanek
- University of Utah, Salt Lake City, Utah, United States
| |
Collapse
|
12
|
Stern M, Istrate N, Mazzucato L. A reservoir of timescales emerges in recurrent circuits with heterogeneous neural assemblies. eLife 2023; 12:e86552. [PMID: 38084779 PMCID: PMC10810607 DOI: 10.7554/elife.86552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 12/07/2023] [Indexed: 01/26/2024] Open
Abstract
The temporal activity of many physical and biological systems, from complex networks to neural circuits, exhibits fluctuations simultaneously varying over a large range of timescales. Long-tailed distributions of intrinsic timescales have been observed across neurons simultaneously recorded within the same cortical circuit. The mechanisms leading to this striking temporal heterogeneity are yet unknown. Here, we show that neural circuits, endowed with heterogeneous neural assemblies of different sizes, naturally generate multiple timescales of activity spanning several orders of magnitude. We develop an analytical theory using rate networks, supported by simulations of spiking networks with cell-type specific connectivity, to explain how neural timescales depend on assembly size and show that our model can naturally explain the long-tailed timescale distribution observed in the awake primate cortex. When driving recurrent networks of heterogeneous neural assemblies by a time-dependent broadband input, we found that large and small assemblies preferentially entrain slow and fast spectral components of the input, respectively. Our results suggest that heterogeneous assemblies can provide a biologically plausible mechanism for neural circuits to demix complex temporal input signals by transforming temporal into spatial neural codes via frequency-selective neural assemblies.
Collapse
Affiliation(s)
- Merav Stern
- Institute of Neuroscience, University of OregonEugeneUnited States
- Faculty of Medicine, The Hebrew University of JerusalemJerusalemIsrael
| | - Nicolae Istrate
- Institute of Neuroscience, University of OregonEugeneUnited States
- Departments of Physics, University of OregonEugeneUnited States
| | - Luca Mazzucato
- Institute of Neuroscience, University of OregonEugeneUnited States
- Departments of Physics, University of OregonEugeneUnited States
- Mathematics and Biology, University of OregonEugeneUnited States
| |
Collapse
|
13
|
Danskin BP, Hattori R, Zhang YE, Babic Z, Aoi M, Komiyama T. Exponential history integration with diverse temporal scales in retrosplenial cortex supports hyperbolic behavior. SCIENCE ADVANCES 2023; 9:eadj4897. [PMID: 38019904 PMCID: PMC10686558 DOI: 10.1126/sciadv.adj4897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 10/27/2023] [Indexed: 12/01/2023]
Abstract
Animals use past experience to guide future choices. The integration of experiences typically follows a hyperbolic, rather than exponential, decay pattern with a heavy tail for distant history. Hyperbolic integration affords sensitivity to both recent environmental dynamics and long-term trends. However, it is unknown how the brain implements hyperbolic integration. We found that mouse behavior in a foraging task showed hyperbolic decay of past experience, but the activity of cortical neurons showed exponential decay. We resolved this apparent mismatch by observing that cortical neurons encode history information with heterogeneous exponential time constants that vary across neurons. A model combining these diverse timescales recreated the heavy-tailed, hyperbolic history integration observed in behavior. In particular, the time constants of retrosplenial cortex (RSC) neurons best matched the behavior, and optogenetic inactivation of RSC uniquely reduced behavioral history dependence. These results indicate that behavior-relevant history information is maintained across multiple timescales in parallel and that RSC is a critical reservoir of information guiding decision-making.
Collapse
Affiliation(s)
- Bethanny P. Danskin
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Ryoma Hattori
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Yu E. Zhang
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Zeljana Babic
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Mikio Aoi
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Takaki Komiyama
- Department of Neurobiology, University of California San Diego, La Jolla, CA, USA
- Center for Neural Circuits and Behavior, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
14
|
Rajagopalan AE, Darshan R, Hibbard KL, Fitzgerald JE, Turner GC. Reward expectations direct learning and drive operant matching in Drosophila. Proc Natl Acad Sci U S A 2023; 120:e2221415120. [PMID: 37733736 PMCID: PMC10523640 DOI: 10.1073/pnas.2221415120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/11/2023] [Indexed: 09/23/2023] Open
Abstract
Foraging animals must use decision-making strategies that dynamically adapt to the changing availability of rewards in the environment. A wide diversity of animals do this by distributing their choices in proportion to the rewards received from each option, Herrnstein's operant matching law. Theoretical work suggests an elegant mechanistic explanation for this ubiquitous behavior, as operant matching follows automatically from simple synaptic plasticity rules acting within behaviorally relevant neural circuits. However, no past work has mapped operant matching onto plasticity mechanisms in the brain, leaving the biological relevance of the theory unclear. Here, we discovered operant matching in Drosophila and showed that it requires synaptic plasticity that acts in the mushroom body and incorporates the expectation of reward. We began by developing a dynamic foraging paradigm to measure choices from individual flies as they learn to associate odor cues with probabilistic rewards. We then built a model of the fly mushroom body to explain each fly's sequential choice behavior using a family of biologically realistic synaptic plasticity rules. As predicted by past theoretical work, we found that synaptic plasticity rules could explain fly matching behavior by incorporating stimulus expectations, reward expectations, or both. However, by optogenetically bypassing the representation of reward expectation, we abolished matching behavior and showed that the plasticity rule must specifically incorporate reward expectations. Altogether, these results reveal the first synapse-level mechanisms of operant matching and provide compelling evidence for the role of reward expectation signals in the fly brain.
Collapse
Affiliation(s)
- Adithya E. Rajagopalan
- Janelia Research Campus, HHMI, Ashburn, VA20147
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD21205
| | - Ran Darshan
- Janelia Research Campus, HHMI, Ashburn, VA20147
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Sagol School of Neuroscience, The School of Physics and Astronomy, Tel Aviv University, Tel Aviv6997801, Israel
| | | | | | | |
Collapse
|
15
|
Abstract
A preliminary theory of a temporary increase in the rate of an operant response with the transition to extinction (i.e., the extinction burst) is proposed. The theory assumes reinforcers are events permitting access to some valuable activity, and that such activity can compete for allocation with the target response under some conditions (e.g., very high reinforcement rates). With the transition to extinction, elimination of this competition for allocation can produce an increase in the the target response, but the increase is transient because the value of the target response decreases with exposure to extinction. The theory provides a way to understand why the extinction burst is not ubiquitous, seems more common following very small ratio schedules, occurs for a short period of time following the transition to extinction, and may be eliminated with the availability of alternative reinforcement. It appears to provide a reasonable starting point for a theory of the extinction burst that does not necessarily require inclusion of invigorating effects of frustration, and it is closely aligned with Resurgence as Choice theory. Additional research on factors modulating reinforcement-related activities and how they affect the extinction burst could help to further evaluate the theory.
Collapse
Affiliation(s)
- Timothy A. Shahan
- Department of Psychology, Utah State University, 2810 Old Main Hill, Logan, UT 84321-2810 USA
| |
Collapse
|
16
|
Puelma Touzel M, Cisek P, Lajoie G. Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost. PLoS Comput Biol 2022; 18:e1010080. [PMID: 35617370 PMCID: PMC9176815 DOI: 10.1371/journal.pcbi.1010080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 06/08/2022] [Accepted: 04/05/2022] [Indexed: 11/18/2022] Open
Abstract
Finding the right amount of deliberation, between insufficient and excessive, is a hard decision making problem that depends on the value we place on our time. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the opportunity cost of time, including deliberation time. Importantly, this cost can itself vary with the environmental context and is not trivial to estimate. Here, we propose how the opportunity cost of deliberation can be estimated adaptively on multiple timescales to account for non-stationary contextual factors. We use it in a simple decision-making heuristic based on average-reward reinforcement learning (AR-RL) that we call Performance-Gated Deliberation (PGD). We propose PGD as a strategy used by animals wherein deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We show PGD outperforms AR-RL solutions in explaining behaviour and urgency of non-human primates in a context-varying random walk prediction task and is consistent with relative performance and urgency in a context-varying random dot motion task. We make readily testable predictions for both neural activity and behaviour.
Collapse
Affiliation(s)
- Maximilian Puelma Touzel
- Mila, Québec AI Institute, Montréal, Canada
- Department of Computer Science & Operations Research, Université de Montréal, Montréal, Canada
- * E-mail:
| | - Paul Cisek
- Department of Neuroscience, Université de Montréal, Montréal, Canada
| | - Guillaume Lajoie
- Mila, Québec AI Institute, Montréal, Canada
- Department of Mathematics & Statistics, Université de Montréal, Montréal, Canada
| |
Collapse
|
17
|
Lyu N, Hu Y, Zhang J, Lloyd H, Sun YH, Tao Y. Switching costs in stochastic environments drive the emergence of matching behaviour in animal decision-making through the promotion of reward learning strategies. Sci Rep 2021; 11:23593. [PMID: 34880339 PMCID: PMC8654859 DOI: 10.1038/s41598-021-02979-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 11/23/2021] [Indexed: 11/18/2022] Open
Abstract
A principle of choice in animal decision-making named probability matching (PM) has long been detected in animals, and can arise from different decision-making strategies. Little is known about how environmental stochasticity may influence the switching time of these different decision-making strategies. Here we address this problem using a combination of behavioral and theoretical approaches, and show, that although a simple Win-Stay-Loss-Shift (WSLS) strategy can generate PM in binary-choice tasks theoretically, budgerigars (Melopsittacus undulates) actually apply a range of sub-tactics more often when they are expected to make more accurate decisions. Surprisingly, budgerigars did not get more rewards than would be predicted when adopting a WSLS strategy, and their decisions also exhibited PM. Instead, budgerigars followed a learning strategy based on reward history, which potentially benefits individuals indirectly from paying lower switching costs. Furthermore, our data suggest that more stochastic environments may promote reward learning through significantly less switching. We suggest that switching costs driven by the stochasticity of an environmental niche can potentially represent an important selection pressure associated with decision-making that may play a key role in driving the evolution of complex cognition in animals.
Collapse
Affiliation(s)
- Nan Lyu
- Ministry of Education Key Laboratory for Biodiversity and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China.
| | - Yunbiao Hu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Jiahua Zhang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Huw Lloyd
- Department of Natural Sciences, Faculty of Science and Engineering, Manchester Metropolitan University, Manchester, UK
| | - Yue-Hua Sun
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China.
| | - Yi Tao
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, People's Republic of China.
| |
Collapse
|
18
|
Trepka E, Spitmaan M, Bari BA, Costa VD, Cohen JY, Soltani A. Entropy-based metrics for predicting choice behavior based on local response to reward. Nat Commun 2021; 12:6567. [PMID: 34772943 PMCID: PMC8590026 DOI: 10.1038/s41467-021-26784-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 10/18/2021] [Indexed: 11/16/2022] Open
Abstract
For decades, behavioral scientists have used the matching law to quantify how animals distribute their choices between multiple options in response to reinforcement they receive. More recently, many reinforcement learning (RL) models have been developed to explain choice by integrating reward feedback over time. Despite reasonable success of RL models in capturing choice on a trial-by-trial basis, these models cannot capture variability in matching behavior. To address this, we developed metrics based on information theory and applied them to choice data from dynamic learning tasks in mice and monkeys. We found that a single entropy-based metric can explain 50% and 41% of variance in matching in mice and monkeys, respectively. We then used limitations of existing RL models in capturing entropy-based metrics to construct more accurate models of choice. Together, our entropy-based metrics provide a model-free tool to predict adaptive choice behavior and reveal underlying neural mechanisms.
Collapse
Affiliation(s)
- Ethan Trepka
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Mehran Spitmaan
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Bilal A Bari
- The Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Brain Science Institute, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Vincent D Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Jeremiah Y Cohen
- The Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Brain Science Institute, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Kavli Neuroscience Discovery Institute, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Alireza Soltani
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA.
| |
Collapse
|
19
|
Wang S, Feng SF, Bornstein AM. Mixing memory and desire: How memory reactivation supports deliberative decision-making. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1581. [PMID: 34665529 DOI: 10.1002/wcs.1581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 08/24/2021] [Accepted: 09/16/2021] [Indexed: 11/09/2022]
Abstract
Memories affect nearly every aspect of our mental life. They allow us to both resolve uncertainty in the present and to construct plans for the future. Recently, renewed interest in the role memory plays in adaptive behavior has led to new theoretical advances and empirical observations. We review key findings, with particular emphasis on how the retrieval of many kinds of memories affects deliberative action selection. These results are interpreted in a sequential inference framework, in which reinstatements from memory serve as "samples" of potential action outcomes. The resulting model suggests a central role for the dynamics of memory reactivation in determining the influence of different kinds of memory in decisions. We propose that representation-specific dynamics can implement a bottom-up "product of experts" rule that integrates multiple sets of action-outcome predictions weighted based on their uncertainty. We close by reviewing related findings and identifying areas for further research. This article is categorized under: Psychology > Reasoning and Decision Making Neuroscience > Cognition Neuroscience > Computation.
Collapse
Affiliation(s)
- Shaoming Wang
- Department of Psychology, New York University, New York, New York, USA
| | - Samuel F Feng
- Department of Mathematics, Khalifa University of Science and Technology, Abu Dhabi, UAE.,Khalifa University Centre for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, UAE
| | - Aaron M Bornstein
- Department of Cognitive Sciences, University of California-Irvine, Irvine, California, USA.,Center for the Neurobiology of Learning & Memory, University of California-Irvine, Irvine, California, USA.,Institute for Mathematical Behavioral Sciences, University of California-Irvine, Irvine, California, USA
| |
Collapse
|
20
|
Choice history effects in mice and humans improve reward harvesting efficiency. PLoS Comput Biol 2021; 17:e1009452. [PMID: 34606493 PMCID: PMC8516315 DOI: 10.1371/journal.pcbi.1009452] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 10/14/2021] [Accepted: 09/15/2021] [Indexed: 12/04/2022] Open
Abstract
Choice history effects describe how future choices depend on the history of past choices. In experimental tasks this is typically framed as a bias because it often diminishes the experienced reward rates. However, in natural habitats, choices made in the past constrain choices that can be made in the future. For foraging animals, the probability of earning a reward in a given patch depends on the degree to which the animals have exploited the patch in the past. One problem with many experimental tasks that show choice history effects is that such tasks artificially decouple choice history from its consequences on reward availability over time. To circumvent this, we use a variable interval (VI) reward schedule that reinstates a more natural contingency between past choices and future reward availability. By examining the behavior of optimal agents in the VI task we discover that choice history effects observed in animals serve to maximize reward harvesting efficiency. We further distil the function of choice history effects by manipulating first- and second-order statistics of the environment. We find that choice history effects primarily reflect the growth rate of the reward probability of the unchosen option, whereas reward history effects primarily reflect environmental volatility. Based on observed choice history effects in animals, we develop a reinforcement learning model that explicitly incorporates choice history over multiple time scales into the decision process, and we assess its predictive adequacy in accounting for the associated behavior. We show that this new variant, known as the double trace model, has a higher performance in predicting choice data, and shows near optimal reward harvesting efficiency in simulated environments. These results suggests that choice history effects may be adaptive for natural contingencies between consumption and reward availability. This concept lends credence to a normative account of choice history effects that extends beyond its description as a bias. Animals foraging for food in natural habitats compete to obtain better quality food patches. To achieve this goal, animals can rely on memory and choose the same patches that have provided higher quality of food in the past. However, in natural habitats simply identifying better food patches may not be sufficient to successfully compete with their conspecifics, as food resources can grow over time. Therefore, it makes sense to visit from time to time those patches that were associated with lower food quality in the past. This demands optimal foraging animals to keep in memory not only which food patches provided the best food quality, but also which food patches they visited recently. To see if animals track their history of visits and use it to maximize the food harvesting efficiency, we subjected them to experimental conditions that mimicked natural foraging behavior. In our behavioral tasks, we replaced food foraging behavior with a two choice task that provided rewards to mice and humans. By developing a new computational model and subjecting animals to various behavioral manipulations, we demonstrate that keeping a memory of past visits helps the animals to optimize the efficiency with which they can harvest rewards.
Collapse
|
21
|
Wojtak W, Ferreira F, Vicente P, Louro L, Bicho E, Erlhagen W. A neural integrator model for planning and value-based decision making of a robotics assistant. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05224-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
22
|
Mochol G, Kiani R, Moreno-Bote R. Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Curr Biol 2021; 31:1234-1244.e6. [PMID: 33639107 PMCID: PMC8095400 DOI: 10.1016/j.cub.2021.01.068] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 10/01/2020] [Accepted: 01/20/2021] [Indexed: 02/07/2023]
Abstract
Goal-directed behavior requires integrating sensory information with prior knowledge about the environment. Behavioral biases that arise from these priors could increase positive outcomes when the priors match the true structure of the environment, but mismatches also happen frequently and could cause unfavorable outcomes. Biases that reduce gains and fail to vanish with training indicate fundamental suboptimalities arising from ingrained heuristics of the brain. Here, we report systematic, gain-reducing choice biases in highly trained monkeys performing a motion direction discrimination task where only the current stimulus is behaviorally relevant. The monkey's bias fluctuated at two distinct time scales: slow, spanning tens to hundreds of trials, and fast, arising from choices and outcomes of the most recent trials. Our findings enabled single trial prediction of biases, which influenced the choice especially on trials with weak stimuli. The pre-stimulus activity of neuronal ensembles in the monkey prearcuate gyrus represented these biases as an offset along the decision axis in the state space. This offset persisted throughout the stimulus viewing period, when sensory information was integrated, leading to a biased choice. The pre-stimulus representation of history-dependent bias was functionally indistinguishable from the neural representation of upcoming choice before stimulus onset, validating our model of single-trial biases and suggesting that pre-stimulus representation of choice could be fully defined by biases inferred from behavioral history. Our results indicate that the prearcuate gyrus reflects intrinsic heuristics that compute bias signals, as well as the mechanisms that integrate them into the oculomotor decision-making process.
Collapse
Affiliation(s)
- Gabriela Mochol
- Center for Brain and Cognition and Department of Information and Communications Technologies, Pompeu Fabra University, Barcelona, Spain.
| | - Roozbeh Kiani
- Center for Neural Science, New York University, New York, NY 10003, USA; Neuroscience Institute, NYU Langone Medical Center, New York, NY 10016, USA; Department of Psychology, New York University, New York, NY 10003, USA
| | - Rubén Moreno-Bote
- Center for Brain and Cognition and Department of Information and Communications Technologies, Pompeu Fabra University, Barcelona, Spain
| |
Collapse
|
23
|
Houston AI, Trimmer PC, McNamara JM. Matching Behaviours and Rewards. Trends Cogn Sci 2021; 25:403-415. [PMID: 33612384 DOI: 10.1016/j.tics.2021.01.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 01/19/2021] [Accepted: 01/21/2021] [Indexed: 10/22/2022]
Abstract
Matching describes how behaviour is related to rewards. The matching law holds when the ratio of an individual's behaviours equals the ratio of the rewards obtained. From its origins in the study of pigeons working for food in the laboratory, the law has been applied to a range of species, both in the laboratory and outside it (e.g., human sporting decisions). Probability matching occurs when the probability of a behaviour equals the probability of being rewarded. Input matching predicts the distribution of individuals across habitats. We evaluate the rationality of the matching law and probability matching, expose the logic of matching in real-world cases, review how recent neuroscience findings relate to matching, and suggest future research directions.
Collapse
Affiliation(s)
- Alasdair I Houston
- School of Biological Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK.
| | - Pete C Trimmer
- Department of Psychology, University of Warwick, Coventry, CV4 7AL, UK
| | - John M McNamara
- School of Mathematics, University of Bristol, Fry Building, Woodland Road, Bristol, BS8 1UG, UK
| |
Collapse
|
24
|
Monosov IE. How Outcome Uncertainty Mediates Attention, Learning, and Decision-Making. Trends Neurosci 2020; 43:795-809. [PMID: 32736849 PMCID: PMC8153236 DOI: 10.1016/j.tins.2020.06.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 06/16/2020] [Accepted: 06/24/2020] [Indexed: 01/24/2023]
Abstract
Animals and humans evolved sophisticated nervous systems that endowed them with the ability to form internal-models or beliefs and make predictions about the future to survive and flourish in a world in which future outcomes are often uncertain. Crucial to this capacity is the ability to adjust behavioral and learning policies in response to the level of uncertainty. Until recently, the neuronal mechanisms that could underlie such uncertainty-guided control have been largely unknown. In this review, I discuss newly discovered neuronal circuits in primates that represent uncertainty about future rewards and propose how they guide information-seeking, attention, decision-making, and learning to help us survive in an uncertain world. Lastly, I discuss the possible relevance of these findings to learning in artificial systems.
Collapse
Affiliation(s)
- Ilya E Monosov
- Department of Neuroscience and Neurosurgery, Washington University School of Medicine in St. Louis, MO, USA; Department of Biomedical Engineering, Washington University School of Medicine in St. Louis, MO, USA; Washington University Pain Center, Washington University School of Medicine in St. Louis, MO, USA.
| |
Collapse
|
25
|
Collins AGE, Cockburn J. Beyond dichotomies in reinforcement learning. Nat Rev Neurosci 2020; 21:576-586. [PMID: 32873936 DOI: 10.1038/s41583-020-0355-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2020] [Indexed: 11/09/2022]
Abstract
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making.
Collapse
Affiliation(s)
- Anne G E Collins
- Department of Psychology and the Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Jeffrey Cockburn
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
26
|
Stoilova VV, Knauer B, Berg S, Rieber E, Jäkel F, Stüttgen MC. Auditory cortex reflects goal-directed movement but is not necessary for behavioral adaptation in sound-cued reward tracking. J Neurophysiol 2020; 124:1056-1071. [PMID: 32845769 DOI: 10.1152/jn.00736.2019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Mounting evidence suggests that the role of sensory cortices in perceptual decision making goes beyond the mere representation of the discriminative stimuli and additionally involves the representation of nonsensory variables such as reward expectation. However, the relevance of these representations for behavior is not clear. To address this issue, we trained rats to discriminate sounds in a single-interval forced-choice task and then confronted the animals with unsignaled blockwise changes of reward probabilities. We found that unequal reward probabilities for the two choice options led to substantial shifts in response bias without concomitant reduction in stimulus discrimination. Although decisional biases were on average less extreme than required to maximize overall reinforcement, a model-based analysis revealed that rats managed to harvest >97% of rewards. Neurons in auditory cortex recorded during task performance weakly differentiated the discriminative stimuli but more strongly the subsequent goal-directed movement. Although 10-20% of units exhibited significantly different firing rates between task epochs with different response biases, control experiments showed this to result from inflated false positive rates due to unspecific temporal correlations of spiking activity rather than changing reinforcement contingencies. Transient pharmacological inactivation of auditory cortex reduced sound discriminability without affecting other measures of performance, whereas inactivation of medial prefrontal cortex affected both discriminability and bias. Together, these results suggest that auditory cortex activity only weakly reflects decisional variables during flexible updating of stimulus-response-outcome contingencies and does not play a crucial role in sound-cued adaptive behavior, beyond the representation of the discriminative stimuli.NEW & NOTEWORTHY Recent evidence suggests that sensory cortex represents nonsensory variables such as reward expectation, but the relevance of these representations for behavior is not well understood. We show that rat auditory cortex (AC) is modulated during movement and reward anticipation in a sound-cued reward tracking task, whereas AC inactivation only impaired discrimination without affecting reward tracking, consistent with a predominantly sensory role of AC.
Collapse
Affiliation(s)
- Vanya V Stoilova
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Beate Knauer
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Stephanie Berg
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Evelyn Rieber
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Frank Jäkel
- Centre for Cognitive Science, Institute of Psychology, Technische Universität Darmstadt, Darmstadt, Germany
| | - Maik C Stüttgen
- Institute of Pathophysiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,Focus Program Translational Neurosciences, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
27
|
Does Brain Lateralization Affect the Performance in Binary Choice Tasks? A Study in the Animal Model Danio rerio. Symmetry (Basel) 2020. [DOI: 10.3390/sym12081294] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Researchers in behavioral neuroscience commonly observe the behavior of animal subjects in the presence of two alternative stimuli. However, this type of binary choice introduces a potential confound related to side biases. Understanding whether subjects exhibit this bias, and the origin of it (pre-existent or acquired throughout the experimental sessions), is particularly important to interpreting the results. Here, we tested the hypothesis according to which brain lateralization may influence the emergence of side biases in a well-known model of neuroscience, the zebrafish. As a measure of lateralization, individuals were observed in their spontaneous tendencies to monitor a potential predator with either the left or the right eye. Subjects also underwent an operant conditioning task requiring discrimination between two colors placed on the left–right axis. Although the low performance exhibited in the operant conditioning task prevents firm conclusions from being drawn, a positive correlation was found between the direction of lateralization and the tendency to select the stimulus presented on one specific side (e.g., right). The choice for this preferred side did not change throughout the experimental sessions, meaning that this side bias was not the result of the prolonged training. Overall, our study calls for a wider investigation of pre-existing lateralization biases in animal models to set up methodological counterstrategies to test individuals that do not properly work in a binary choice task with stimuli arranged on the left–right axis.
Collapse
|
28
|
Kourtzi Z, Welchman AE. Learning predictive structure without a teacher: decision strategies and brain routes. Curr Opin Neurobiol 2019; 58:130-134. [PMID: 31569060 DOI: 10.1016/j.conb.2019.09.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 09/03/2019] [Accepted: 09/12/2019] [Indexed: 11/17/2022]
Abstract
Extracting the structure of complex environments is at the core of our ability to interpret the present and predict the future. This skill is important for a range of behaviours from navigating a new city to learning music and language. Classical approaches that investigate our ability to extract the principles of organisation that govern complex environments focus on reward-based learning. Yet, the human brain is shown to be expert at learning generative structure based on mere exposure and without explicit reward. Individuals are shown to adapt to-unbeknownst to them-changes in the environment's temporal statistics and predict future events. Further, we present evidence for a common brain architecture for unsupervised structure learning and reward-based learning, suggesting that the brain is built on the premise that 'learning is its own reward' to support adaptive behaviour.
Collapse
Affiliation(s)
- Zoe Kourtzi
- Department of Psychology, University of Cambridge, Cambridge, UK.
| | | |
Collapse
|