1
|
Mueller D, Giglio E, Chen CS, Holm A, Ebitz RB, Grissom NM. Touchscreen Response Precision Is Sensitive to the Explore/Exploit Trade-off. eNeuro 2025; 12:ENEURO.0538-24.2025. [PMID: 40246556 PMCID: PMC12061356 DOI: 10.1523/eneuro.0538-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 04/04/2025] [Accepted: 04/11/2025] [Indexed: 04/19/2025] Open
Abstract
The explore/exploit trade-off is a fundamental property of choice selection during reward-guided decision making, where the "same" choice can reflect either of these internal cognitive states. An unanswered question is whether the execution of a decision provides an underexplored measure of internal cognitive states. Touchscreens are increasingly used across species for cognitive testing and afford the ability to measure the precise location of choice touch responses. We examined how male and female mice in a restless bandit decision making task interacted with a touchscreen to determine if the explore/exploit trade-off, prior reward, and/or sex differences change the variability in the kinetics of touchscreen choices. During exploit states, successive touch responses are closer together than those made in an explore state, suggesting exploit states reflect periods of increased motor stereotypy. Although exploit decisions might be expected to be rewarded more frequently than explore decisions, we find that immediate past reward reduces choice variability independently of explore/exploit state. Male mice are more variable in their interactions with the touchscreen than females, even in low-variability trials such as exploit or following reward. These results suggest that as exploit behavior emerges in reward-guided decision making, all mice become less variable and more automated in both their choice and the actions taken to make that choice, but this occurs on a background of increased male variability. These data uncover the hidden potential for touchscreen decision making tasks to uncover the latent neural states that unite cognition and movement.
Collapse
Affiliation(s)
- Dana Mueller
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Erin Giglio
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Cathy S Chen
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Aspen Holm
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - R Becket Ebitz
- Department of Neuroscience, University of Montreal, Montreal, Quebec H3T 1J4, Canada
| | - Nicola M Grissom
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
2
|
Chen CS, Knep E, Laurie VJ, Calvin O, Ebitz RB, Fisher M, Schallmo MP, Sponheim SR, Chafee MV, Heilbronner SR, Grissom NM, Redish AD, MacDonald AW, Vinogradov S, Demro C. Beyond reward learning deficits: Exploration-exploitation instability reveals computational heterogeneity in value-based decision making in early psychosis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.04.29.25326698. [PMID: 40343017 PMCID: PMC12060966 DOI: 10.1101/2025.04.29.25326698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/11/2025]
Abstract
Psychosis spectrum illnesses are characterized by impaired goal-directed behavior and significant neurophysiological heterogeneity. To investigate the neurocomputational underpinnings of this heterogeneity, 75 participants with Early Psychosis (EP) and 68 controls completed a dynamic decision-making task. Consistent with prior studies, EP exhibited more choice switching, not explained by reward learning deficits, but instead by increased transition to exploration from exploitation. Bayesian modeling implicated elevated uncertainty intolerance and decision noise as independent contributors to suboptimal transition dynamics across individuals, which identified three computational subtypes with unique cognitive and symptom profiles. Replicating prior studies, a high decision-noise subtype emerged showing learning deficits and worse negative symptoms; our analyses further uncovered a normative subtype with worse mood symptoms and a novel uncertainty-intolerance subtype with higher hospitalization rates. These specific microcognitive disruptions underlying the distinct neurocomputational subtypes are individually measurable and may have the potential for targeted interventions.
Collapse
Affiliation(s)
- Cathy S. Chen
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, United States
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States
| | - Evan Knep
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States
| | | | - Olivia Calvin
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, United States
| | - R. Becket Ebitz
- Department of Neurosciences, University of Montréal, Québec, Canada
| | - Melissa Fisher
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, United States
| | - Michael-Paul Schallmo
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, United States
| | - Scott R. Sponheim
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, United States
- Minneapolis VA Health Care System, Minneapolis, MN, United States
| | - Matthew V. Chafee
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, United States
| | - Sarah R. Heilbronner
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, United States
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas, United States
| | - Nicola M. Grissom
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States
| | - A. David Redish
- Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, United States
| | - Angus W. MacDonald
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States
| | - Sophia Vinogradov
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, United States
| | - Caroline Demro
- Department of Psychiatry and Behavioral Sciences, University of Minnesota, Minneapolis, Minnesota, United States
| |
Collapse
|
3
|
Abbaszadeh M, Ozanick E, Magen N, Darrow D, Yan X, Grissom N, Herman AB, Ebitz BR. Individual differences in sequential decision-making. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.04.647306. [PMID: 40236038 PMCID: PMC11996512 DOI: 10.1101/2025.04.04.647306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
1People differ widely in how they make decisions in uncertain environments. While many studies leverage this variability to measure differences in specific cognitive processes and parameters, the key dimension(s) of individual variability in uncertain decision-making tasks has not been identified. Here, we analyzed behavioral data from 1001 participants performing a restless three-armed bandit task, where reward probabilities fluctuated unpredictably over time. Using a novel analytical approach that controlled for the stochasticity in this tasks, we identified a dominant nonlinear axis of individual variability. We found that this primary axis of variability was strongly and selectively correlated with the probability of exploration, as inferred by latent state modeling. This suggests that the major factor shaping individual differences in bandit task performance is the tendency to explore (versus exploit), rather than personality characteristics, reinforcement learning model parameters, or low-level strategies. Certain demographic characteristics also predicted variance along this principle axis: participants at the exploratory end tended to be younger than participants at the exploitative end, and self-identified men were overrepresented at both extremes. Together, these findings offer a principled framework for understanding individual differences in task behavior while highlighting the cognitive and demographic factors that shape individual differences in decision-making under uncertainty.
Collapse
|
4
|
Yan X, König SD, Ebitz RB, Hayden BY, Darrow DP, Herman AB. Dynamic prefrontal coupling coordinates adaptive decision-making. RESEARCH SQUARE 2025:rs.3.rs-6296852. [PMID: 40297698 PMCID: PMC12036449 DOI: 10.21203/rs.3.rs-6296852/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Adaptive decision-making requires flexibly maintaining or changing behavior in response to uncertainty. While the dorsomedial (dmPFC) and dorsolateral (dIPFC) prefrontal cortex are each essential for this ability, how they coordinate to drive adaptation remains unknown. Using intracranial EEG recordings from human participants performing a dynamic reward task, we identified distinct, frequency-specific computations: dmPFC high-gamma activity encoded uncertainty before stay decisions but transitioned to prediction error before switches, while theta activity shifted from uncertainty to value representation. In contrast, dIPFC theta activity signaled both value and uncertainty before stays, but predominantly value before switches. Crucially, these regions coordinated through two temporally specific coupling mechanisms that predicted behavioral changes: theta-theta amplitude coupling during feedback processing and theta-gamma phase coupling before decisions. Both coupling mechanisms strengthened before switches, suggesting that changing behavior requires greater dmPFC-dIPFC integration than maintaining. These findings reveal how the dorsal prefrontal cortex employs frequency-specific computations and precise temporal coordination to guide adaptive behavior.
Collapse
Affiliation(s)
- Xinyuan Yan
- Department of Psychiatry, University of Minnesota; Minneapolis, MN, USA
| | - Seth D. König
- Department of Psychiatry, University of Minnesota; Minneapolis, MN, USA
- Department of Neurosurgery, University of Minnesota; Minneapolis, MN, USA
| | - R Becket. Ebitz
- Department of Neuroscience, Universite de Montreal, Montreal, Quebec, Canada
| | - Benjamin Y. Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - David P. Darrow
- Department of Neurosurgery, University of Minnesota; Minneapolis, MN, USA
| | | |
Collapse
|
5
|
Glewwe N, Dastin-Van Rijn E, Chen CS, Giglio E, Knep E, Ebitz RB, Widge AS, Grissom NM. Sex-biased computations underlying differential set shift performance in mice. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.01.646712. [PMID: 40236143 PMCID: PMC11996504 DOI: 10.1101/2025.04.01.646712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Cognitive flexibility can be defined as the ability to adaptively shift between choices or strategies based on environmental feedback and it is disrupted in numerous neuropsychiatric conditions. Individual differences in the computations supporting cognitive flexibility are poised to reveal mechanisms of neuropsychiatric risk and resilience. One critical variable well known to influence individual differences in neuropsychiatric risk is sex. While previous research has identified sex differences in value based decision making in mice, whether sex reflects a major source of variation in cognitive flexibility remains unknown. To directly assess sex-biased individual differences in cognitive flexibility, we developed a novel touchscreen Set Shift task that permits robust and continuous testing in mice. Using this task, we discovered that female mice completed significantly more rule shifts with fewer errors than males. We next employed a suite of computational models that revealed sex-biased individual differences in the computations underlying cognitive flexibility. Overall, our results suggest that following rule shifts, female mice learn the new rule faster and commit to exploiting rule choices sooner compared to males - sometimes because they commit to multiple rules simultaneously. This suggests that increased choice stability in female rodents enhances commitment to a strategy during periods of uncertainty and directly contributes to increased rule shifting. This supports the counterintuitive conclusion that a high degree of stable choice is a strong requirement for enhanced cognitive flexibility in the Set Shift task, one of the gold standard cognitive flexibility tasks.
Collapse
|
6
|
Fine JM, Chericoni A, Delgado G, Franch MC, Mickiewicz EA, Chavez AG, Bartoli E, Paulo D, Provenza NR, Watrous A, Yoo SBM, Sheth SA, Hayden BY. Complementary roles for hippocampus and anterior cingulate in composing continuous choice. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.17.643774. [PMID: 40166150 PMCID: PMC11956977 DOI: 10.1101/2025.03.17.643774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Naturalistic, goal directed behavior often requires continuous actions directed at dynamically changing goals. In this context, the closest analogue to choice is a strategic reweighting of multiple goal-specific control policies in response to shifting environmental pressures. To understand the algorithmic and neural bases of choice in continuous contexts, we examined behavior and brain activity in humans performing a continuous prey-pursuit task. Using a newly developed control-theoretic decomposition of behavior, we find pursuit strategies are well described by a meta-controller dictating a mixture of lower-level controllers, each linked to specific pursuit goals. Examining hippocampus and anterior cingulate cortex (ACC) population dynamics during goal switches revealed distinct roles for the two regions in parameterizing continuous controller mixing and meta-control. Hippocampal ensemble dynamics encoded the controller blending dynamics, suggesting it implements a mixing of goal-specific control policies. In contrast, ACC ensemble activity exhibited value-dependent ramping activity before goal switches, linking it to a meta-control process that accumulates evidence for switching goals. Our results suggest that hippocampus and ACC play complementary roles corresponding to a generalizable mixture controller and meta-controller that dictates value dependent changes in controller mixing.
Collapse
|
7
|
Zid M, Laurie VJ, Ramírez-Ruiz J, Lavigne-Champagne A, Shourkeshti A, Harrell D, Herman AB, Ebitz RB. Humans forage for reward in reinforcement learning tasks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.07.08.602539. [PMID: 39026817 PMCID: PMC11257465 DOI: 10.1101/2024.07.08.602539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
How do we make good decisions in uncertain environments? In psychology and neuroscience, the classic view is that we calculate the value of each option, compare them, and choose the most rewarding modulo exploratory noise. An ethologist, conversely, would argue that we commit to one option until its value drops below a threshold and then explore alternatives. Because the fields use incompatible methods, it remains unclear which view better describes human decision-making. Here, we found that humans use compare-to-threshold computations in classic compare-alternative tasks. Because compare-alternative computations are central to the reinforcement-learning (RL) models typically used in the cognitive and brain sciences, we developed a novel compare-to-threshold model ("foraging"). Compared to previous RL models, the foraging model better fit participant behavior, better predicted the tendency to repeat choices, and predicted held-out participants that were almost impossible under compare-alternative models. These results suggest that humans use compare-to-threshold computations in sequential decision-making.
Collapse
Affiliation(s)
- Meriam Zid
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Veldon-James Laurie
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Jorge Ramírez-Ruiz
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | | | - Akram Shourkeshti
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| | - Dameon Harrell
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Alexander B. Herman
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, 55455, USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montreal, Montreal, QC , H3T 1J4, Canada
| |
Collapse
|
8
|
Johnston WJ, Fine JM, Yoo SBM, Ebitz RB, Hayden BY. Semi-orthogonal subspaces for value mediate a binding and generalization trade-off. Nat Neurosci 2024; 27:2218-2230. [PMID: 39289564 PMCID: PMC12063212 DOI: 10.1038/s41593-024-01758-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 08/09/2024] [Indexed: 09/19/2024]
Abstract
When choosing between options, we must associate their values with the actions needed to select them. We hypothesize that the brain solves this binding problem through neural population subspaces. Here, in macaques performing a choice task, we show that neural populations in five reward-sensitive regions encode the values of offers presented on the left and right in distinct subspaces. This encoding is sufficient to bind offer values to their locations while preserving abstract value information. After offer presentation, all areas encode the value of the first and second offers in orthogonal subspaces; this orthogonalization also affords binding. Our binding-by-subspace hypothesis makes two new predictions confirmed by the data. First, behavioral errors should correlate with spatial, but not temporal, neural misbinding. Second, behavioral errors should increase when offers have low or high values, compared to medium values, even when controlling for value difference. Together, these results support the idea that the brain uses semi-orthogonal subspaces to bind features.
Collapse
Affiliation(s)
- W Jeffrey Johnston
- Center for Theoretical Neuroscience and Mortimer B. Zuckerman Mind, Brain, and Behavior Institute, Columbia University, New York, NY, USA.
| | - Justin M Fine
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Seng Bum Michael Yoo
- Department of Biomedical Engineering, Sunkyunkwan University, and Center for Neuroscience Imaging Research, Institute of Basic Sciences, Suwon, Republic of Korea
| | - R Becket Ebitz
- Department of Neuroscience, Université de Montréal, Montreal, Quebec, Canada
| | - Benjamin Y Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
9
|
Chen CS, Mueller D, Knep E, Ebitz RB, Grissom NM. Dopamine and Norepinephrine Differentially Mediate the Exploration-Exploitation Tradeoff. J Neurosci 2024; 44:e1194232024. [PMID: 39214707 PMCID: PMC11529815 DOI: 10.1523/jneurosci.1194-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/18/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024] Open
Abstract
Dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision-making processes. Although two neuromodulators share a synthesis pathway and are coactivated under states of arousal, they engage in distinct circuits and modulatory roles. However, the specific role of each neuromodulator in decision-making, in particular the exploration-exploitation tradeoff, remains unclear. Revealing how each neuromodulator contributes to exploration-exploitation tradeoff is important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration, a direct comparison using the same dynamic decision-making task is needed. Here, we ran male and female mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA antagonist (flupenthixol), a nonselective DA agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol) and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine on exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. The modulatory effect of beta-noradrenergic receptor activity on exploration was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via sensitivity to outcome. Together, these findings suggested that the mechanisms that govern the exploration-exploitation transition are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.
Collapse
Affiliation(s)
- Cathy S Chen
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Dana Mueller
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Evan Knep
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - R Becket Ebitz
- Department of Neurosciences, Université de Montréal, Montréal, Quebec H3T 1J4, Canada
| | - Nicola M Grissom
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
10
|
Mueller D, Giglio E, Chen CS, Holm A, Ebitz RB, Grissom NM. Touchscreen response precision is sensitive to the explore/exploit tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.23.619903. [PMID: 39484597 PMCID: PMC11526980 DOI: 10.1101/2024.10.23.619903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The explore/exploit tradeoff is a fundamental property of choice selection during reward-guided decision making. In perceptual decision making, higher certainty decisions are more motorically precise, even when the decision does not require motor accuracy. However, while we can parametrically control uncertainty in perceptual tasks, we do not know what variables - if any - shape motor precision and reflect subjective certainty during reward-guided decision making. Touchscreens are increasingly used across species to measure choice, but provide no tactile feedback on whether an action is precise or not, and therefore provide a valuable opportunity to determine whether actions differ in precision due to explore/exploit state, reward, or individual variables. We find all three of these factors exert independent drives towards increased precision. During exploit states, successive touches to the same choice are closer together than those made in an explore state, consistent with exploit states reflecting higher certainty and/or motor stereotypy in responding. However, exploit decisions might be expected to be rewarded more frequently than explore decisions. We find that exploit choice precision is increased independently of a separate increase in precision due to immediate past reward, suggesting multiple mechanisms regulating choice precision. Finally, we see evidence that male mice in general are less precise in their interactions with the touchscreen than females, even when exploiting a choice. These results suggest that as exploit behavior emerges in reward-guided decision making, individuals become more motorically precise reflecting increased certainty, even when decision choice does not require additional motor accuracy, but this is influenced by individual differences and prior reward. These data uncover the hidden potential for touchscreen tasks in any species to uncover the latent neural states that unite cognition and movement.
Collapse
Affiliation(s)
- Dana Mueller
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - Erin Giglio
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - Cathy S Chen
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - Aspen Holm
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| | - R Becket Ebitz
- Department of Neurosciences, Université de Montréal, Quebec, Canada
| | - Nicola M Grissom
- Department of Psychology, University of Minnesota, Minneapolis MN 55455
| |
Collapse
|
11
|
Jurewicz K, Sleezer BJ, Mehta PS, Hayden BY, Ebitz RB. Irrational choices via a curvilinear representational geometry for value. Nat Commun 2024; 15:6424. [PMID: 39080250 PMCID: PMC11289086 DOI: 10.1038/s41467-024-49568-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/06/2024] [Indexed: 08/02/2024] Open
Abstract
We make decisions by comparing values, but it is not yet clear how value is represented in the brain. Many models assume, if only implicitly, that the representational geometry of value is linear. However, in part due to a historical focus on noisy single neurons, rather than neuronal populations, this hypothesis has not been rigorously tested. Here, we examine the representational geometry of value in the ventromedial prefrontal cortex (vmPFC), a part of the brain linked to economic decision-making, in two male rhesus macaques. We find that values are encoded along a curved manifold in vmPFC. This curvilinear geometry predicts a specific pattern of irrational decision-making: that decision-makers will make worse choices when an irrelevant, decoy option is worse in value, compared to when it is better. We observe this type of irrational choices in behavior. Together, these results not only suggest that the representational geometry of value is nonlinear, but that this nonlinearity could impose bounds on rational decision-making.
Collapse
Affiliation(s)
- Katarzyna Jurewicz
- Department of Neurosciences, Faculté de médecine, and Centre interdisciplinaire de recherche sur le cerveau et l'apprentissage, Université de Montréal, Montréal, QC, Canada
- Department of Physiology, Faculty of Medicine and Health Sciences, McGill University, Montréal, QC, Canada
| | - Brianna J Sleezer
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering, University of Minnesota, Minneapolis, MN, USA
| | - Priyanka S Mehta
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering, University of Minnesota, Minneapolis, MN, USA
- Psychology Program, Department of Human Behavior, Justice, and Diversity, University of Wisconsin, Superior, Superior, WI, USA
| | - Benjamin Y Hayden
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - R Becket Ebitz
- Department of Neurosciences, Faculté de médecine, and Centre interdisciplinaire de recherche sur le cerveau et l'apprentissage, Université de Montréal, Montréal, QC, Canada.
| |
Collapse
|
12
|
Mayne P, Das J, Zou S, Sullivan RKP, Burne THJ. Perineuronal nets are associated with decision making under conditions of uncertainty in female but not male mice. Behav Brain Res 2024; 461:114845. [PMID: 38184206 DOI: 10.1016/j.bbr.2024.114845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/21/2023] [Accepted: 01/02/2024] [Indexed: 01/08/2024]
Abstract
Biological sex influences decision-making processes in significant ways, differentiating the responses animals choose when faced with a range of stimuli. The neurobiological underpinnings that dictate sex differences in decision-making tasks remains an important open question, yet single-sex studies of males form most studies in behavioural neuroscience. Here we used female and male BALB/c mice on two spatial learning and memory tasks and examined the expression of perineuronal nets (PNNs) and parvalbumin interneurons (PV) in regions correlated with spatial memory. Mice underwent the aversive active place avoidance (APA) task or the appetitive trial-unique nonmatching-to-location (TUNL) touchscreen task. Mice in the APA cohort learnt to avoid the foot-shock and no differences were observed on key measures of the task nor in the number and intensity of PNNs and PV. On the delay but not separation manipulation in the TUNL task, females received more incorrect trials and less correct trials compared to males. Furthermore, females in this cohort exhibited higher intensity PNNs and PV cells in the agranular and granular retrosplenial cortex, compared to males. These data show that female and male mice perform similarly on spatial learning tasks. However, sex differences in neural circuitry may underly differences in making decisions under conditions of uncertainty on an appetitive task. These data emphasise the importance of using mice of both sexes in studies of decision-making neuroscience.
Collapse
Affiliation(s)
- Phoebe Mayne
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Joyosmita Das
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Simin Zou
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Robert K P Sullivan
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Thomas H J Burne
- Queensland Brain Institute, The University of Queensland, St Lucia, QLD 4072, Australia; Queensland Centre for Mental Health Research, Wacol, QLD 4076, Australia.
| |
Collapse
|
13
|
Colas JT, O’Doherty JP, Grafton ST. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts. PLoS Comput Biol 2024; 20:e1011950. [PMID: 38552190 PMCID: PMC10980507 DOI: 10.1371/journal.pcbi.1011950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/26/2024] [Indexed: 04/01/2024] Open
Abstract
Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.
Collapse
Affiliation(s)
- Jaron T. Colas
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - John P. O’Doherty
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, California, United States of America
- Computation and Neural Systems Program, California Institute of Technology, Pasadena, California, United States of America
| | - Scott T. Grafton
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, California, United States of America
| |
Collapse
|
14
|
Maisson DJN, Cervera RL, Voloh B, Conover I, Zambre M, Zimmermann J, Hayden BY. Widespread coding of navigational variables in prefrontal cortex. Curr Biol 2023; 33:3478-3488.e3. [PMID: 37541250 PMCID: PMC10984098 DOI: 10.1016/j.cub.2023.07.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 06/01/2023] [Accepted: 07/13/2023] [Indexed: 08/06/2023]
Abstract
To navigate effectively, we must represent information about our location in the environment. Traditional research highlights the role of the hippocampal complex in this process. Spurred by recent research highlighting the widespread cortical encoding of cognitive and motor variables previously thought to have localized function, we hypothesized that navigational variables would be likewise encoded widely, especially in the prefrontal cortex, which is associated with volitional behavior. We recorded neural activity from six prefrontal regions while macaques performed a foraging task in an open enclosure. In all regions, we found strong encoding of allocentric position, allocentric head direction, boundary distance, and linear and angular velocity. These encodings were not accounted for by distance, time to reward, or motor factors. The strength of coding of all variables increased along a ventral-to-dorsal gradient. Together, these results argue that encoding of navigational variables is not localized to the hippocampus and support the hypothesis that navigation is continuous with other forms of flexible cognition in the service of action.
Collapse
Affiliation(s)
- David J-N Maisson
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Roberto Lopez Cervera
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Benjamin Voloh
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Indirah Conover
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Mrunal Zambre
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Jan Zimmermann
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Benjamin Y Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
15
|
Chen CS, Mueller D, Knep E, Ebitz RB, Grissom NM. Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.09.523322. [PMID: 36711959 PMCID: PMC9881999 DOI: 10.1101/2023.01.09.523322] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The catecholamines dopamine (DA) and norepinephrine (NE) have been repeatedly implicated in neuropsychiatric vulnerability, in part via their roles in mediating the decision making processes. Although the two neuromodulators share a synthesis pathway and are co-activated under states of arousal, they engage in distinct circuits and roles in modulating neural activity across the brain. However, in the computational neuroscience literature, they have been assigned similar roles in modulating the latent cognitive processes of decision making, in particular the exploration-exploitation tradeoff. Revealing how each neuromodulator contributes to this explore-exploit process will be important in guiding mechanistic hypotheses emerging from computational psychiatric approaches. To understand the differences and overlaps of the roles of these two catecholamine systems in regulating exploration and exploitation, a direct comparison using the same dynamic decision making task is needed. Here, we ran mice in a restless two-armed bandit task, which encourages both exploration and exploitation. We systemically administered a nonselective DA receptor antagonist (flupenthixol), a nonselective DA receptor agonist (apomorphine), a NE beta-receptor antagonist (propranolol), and a NE beta-receptor agonist (isoproterenol), and examined changes in exploration within subjects across sessions. We found a bidirectional modulatory effect of dopamine receptor activity on the level of exploration. Increasing dopamine activity decreased exploration and decreasing dopamine activity increased exploration. Beta-noradrenergic receptor activity also modulated exploration, but the modulatory effect was mediated by sex. Reinforcement learning model parameters suggested that dopamine modulation affected exploration via decision noise and norepinephrine modulation affected exploration via outcome sensitivity. Together, these findings suggested that the mechanisms that govern the transition between exploration and exploitation are sensitive to changes in both catecholamine functions and revealed differential roles for NE and DA in mediating exploration.
Collapse
|
16
|
Voloh B, Eisenreich BR, Maisson DJN, Ebitz RB, Park HS, Hayden BY, Zimmermann J. Hierarchical organization of rhesus macaque behavior. OXFORD OPEN NEUROSCIENCE 2023; 2:kvad006. [PMID: 37577290 PMCID: PMC10421634 DOI: 10.1093/oons/kvad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 06/11/2023] [Accepted: 06/12/2023] [Indexed: 08/15/2023]
Abstract
Primatologists, psychologists and neuroscientists have long hypothesized that primate behavior is highly structured. However, delineating that structure has been impossible due to the difficulties of precision behavioral tracking. Here we analyzed a dataset consisting of continuous measures of the 3D position of two male rhesus macaques (Macaca mulatta) performing three different tasks in a large unrestrained environment over several hours. Using an unsupervised embedding approach on the tracked joints, we identified commonly repeated pose patterns, which we call postures. We found that macaques' behavior is characterized by 49 distinct postures, lasting an average of 0.6 seconds. We found evidence that behavior is hierarchically organized, in that transitions between poses tend to occur within larger modules, which correspond to identifiable actions; these actions are further organized hierarchically. Our behavioral decomposition allows us to identify universal (cross-individual and cross-task) and unique (specific to each individual and task) principles of behavior. These results demonstrate the hierarchical nature of primate behavior, provide a method for the automated ethogramming of primate behavior, and provide important constraints on neural models of pose generation.
Collapse
Affiliation(s)
- Benjamin Voloh
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, 1 Baylor Plaza, Houston, TX 77030
| | - Benjamin R Eisenreich
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, 1 Baylor Plaza, Houston, TX 77030
| | - David J-N Maisson
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, 1 Baylor Plaza, Houston, TX 77030
| | - R Becket Ebitz
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, 1 Baylor Plaza, Houston, TX 77030
| | - Hyun Soo Park
- Department of Computer Science and Engineering, University of Minnesota, 40 Church St, Minneapolis, MN 55455, USA
| | - Benjamin Y Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, 1 Baylor Plaza, Houston, TX 77030
| | - Jan Zimmermann
- Department of Neuroscience, Center for Magnetic Resonance Research, Center for Neuroengineering, 1 Baylor Plaza, Houston, TX 77030
| |
Collapse
|
17
|
Shourkeshti A, Marrocco G, Jurewicz K, Moore T, Ebitz RB. Pupil size predicts the onset of exploration in brain and behavior. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.541981. [PMID: 37292773 PMCID: PMC10245915 DOI: 10.1101/2023.05.24.541981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In uncertain environments, intelligent decision-makers exploit actions that have been rewarding in the past, but also explore actions that could be even better. Several neuromodulatory systems are implicated in exploration, based, in part, on work linking exploration to pupil size-a peripheral correlate of neuromodulatory tone and index of arousal. However, pupil size could instead track variables that make exploration more likely, like volatility or reward, without directly predicting either exploration or its neural bases. Here, we simultaneously measured pupil size, exploration, and neural population activity in the prefrontal cortex while two rhesus macaques explored and exploited in a dynamic environment. We found that pupil size under constant luminance specifically predicted the onset of exploration, beyond what could be explained by reward history. Pupil size also predicted disorganized patterns of prefrontal neural activity at both the single neuron and population levels, even within periods of exploitation. Ultimately, our results support a model in which pupil-linked mechanisms promote the onset of exploration via driving the prefrontal cortex through a critical tipping point where prefrontal control dynamics become disorganized and exploratory decisions are possible.
Collapse
Affiliation(s)
- Akram Shourkeshti
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Gabriel Marrocco
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| | - Katarzyna Jurewicz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
- Department of Physiology, McGill University, Montréal, QC, Canada
| | - Tirin Moore
- Department of Neurobiology, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - R. Becket Ebitz
- Department of Neurosciences, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
18
|
Barnett WH, Kuznetsov A, Lapish CC. Distinct cortico-striatal compartments drive competition between adaptive and automatized behavior. PLoS One 2023; 18:e0279841. [PMID: 36943842 PMCID: PMC10030038 DOI: 10.1371/journal.pone.0279841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 12/15/2022] [Indexed: 03/23/2023] Open
Abstract
Cortical and basal ganglia circuits play a crucial role in the formation of goal-directed and habitual behaviors. In this study, we investigate the cortico-striatal circuitry involved in learning and the role of this circuitry in the emergence of inflexible behaviors such as those observed in addiction. Specifically, we develop a computational model of cortico-striatal interactions that performs concurrent goal-directed and habit learning. The model accomplishes this by distinguishing learning processes in the dorsomedial striatum (DMS) that rely on reward prediction error signals as distinct from the dorsolateral striatum (DLS) where learning is supported by salience signals. These striatal subregions each operate on unique cortical input: the DMS receives input from the prefrontal cortex (PFC) which represents outcomes, and the DLS receives input from the premotor cortex which determines action selection. Following an initial learning of a two-alternative forced choice task, we subjected the model to reversal learning, reward devaluation, and learning a punished outcome. Behavior driven by stimulus-response associations in the DLS resisted goal-directed learning of new reward feedback rules despite devaluation or punishment, indicating the expression of habit. We repeated these simulations after the impairment of executive control, which was implemented as poor outcome representation in the PFC. The degraded executive control reduced the efficacy of goal-directed learning, and stimulus-response associations in the DLS were even more resistant to the learning of new reward feedback rules. In summary, this model describes how circuits of the dorsal striatum are dynamically engaged to control behavior and how the impairment of executive control by the PFC enhances inflexible behavior.
Collapse
Affiliation(s)
- William H. Barnett
- Department of Psychology, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Alexey Kuznetsov
- Department of Mathematics, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| | - Christopher C. Lapish
- Department of Psychology, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
- Stark Neurosciences Research Institute, Indiana University—Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| |
Collapse
|
19
|
Pisupati S, Niv Y. The challenges of lifelong learning in biological and artificial systems. Trends Cogn Sci 2022; 26:1051-1053. [PMID: 36335012 PMCID: PMC9676180 DOI: 10.1016/j.tics.2022.09.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 09/28/2022] [Indexed: 11/11/2022]
Abstract
How do biological systems learn continuously throughout their lifespans, adapting to change while retaining old knowledge, and how can these principles be applied to artificial learning systems? In this Forum article we outline challenges and strategies of 'lifelong learning' in biological and artificial systems, and argue that a collaborative study of each system's failure modes can benefit both.
Collapse
Affiliation(s)
- Sashank Pisupati
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA.
| | - Yael Niv
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| |
Collapse
|
20
|
Post RJ, Bulkin DA, Ebitz RB, Lee V, Han K, Warden MR. Tonic activity in lateral habenula neurons acts as a neutral valence brake on reward-seeking behavior. Curr Biol 2022; 32:4325-4336.e5. [PMID: 36049479 PMCID: PMC9613558 DOI: 10.1016/j.cub.2022.08.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 12/16/2021] [Accepted: 08/09/2022] [Indexed: 11/16/2022]
Abstract
Survival requires both the ability to persistently pursue goals and the ability to determine when it is time to stop, an adaptive balance of perseverance and disengagement. Neural activity in the lateral habenula (LHb) has been linked to negative valence, but its role in regulating the balance between engaged reward seeking and disengaged behavioral states remains unclear. Here, we show that LHb neural activity is tonically elevated during minutes-long periods of disengagement from reward-seeking behavior, both when due to repeated reward omission (negative valence) and when sufficient reward has been consumed (positive valence). Furthermore, we show that LHb inhibition extends ongoing reward-seeking behavioral states but does not prompt task re-engagement. We find no evidence for similar tonic activity changes in ventral tegmental area dopamine neurons. Our findings support a framework in which tonic activity in LHb neurons suppresses engagement in reward-seeking behavior in response to both negatively and positively valenced factors.
Collapse
Affiliation(s)
- Ryan J Post
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA; Cornell Neurotech, Cornell University, Ithaca, NY 14853, USA
| | - David A Bulkin
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA; Cornell Neurotech, Cornell University, Ithaca, NY 14853, USA
| | - R Becket Ebitz
- Department of Neuroscience, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - Vladlena Lee
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Kasey Han
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA
| | - Melissa R Warden
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14853, USA; Cornell Neurotech, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
21
|
Rojas GR, Curry-Pochy LS, Chen CS, Heller AT, Grissom NM. Sequential delay and probability discounting tasks in mice reveal anchoring effects partially attributable to decision noise. Behav Brain Res 2022; 431:113951. [PMID: 35661751 PMCID: PMC9844124 DOI: 10.1016/j.bbr.2022.113951] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/20/2022] [Accepted: 05/29/2022] [Indexed: 01/19/2023]
Abstract
Delay discounting and probability discounting decision making tasks in rodent models have high translational potential. However, it is unclear whether the discounted value of the large reward option is the main contributor to variability in animals' choices in either task, which may limit translation to humans. Male and female mice underwent sessions of delay and probability discounting in sequence to assess how choice behavior adapts over experience with each task. To control for "anchoring" (persistent choices based on the initial delay or probability), mice experienced "Worsening" schedules where the large reward was offered under initially favorable conditions that became less favorable during testing, followed by "Improving" schedules where the large reward was offered under initially unfavorable conditions that improved over a session. During delay discounting, both male and female mice showed elimination of anchoring effects over training. In probability discounting, both sexes of mice continued to show some anchoring even after months of training. One possibility is that "noisy", exploratory choices could contribute to these persistent anchoring effects, rather than constant fluctuations in value discounting. We fit choice behavior in individual animals using models that included both a value-based discounting parameter and a decision noise parameter that captured variability in choices deviating from value maximization. Changes in anchoring behavior over time were tracked by changes in both the value and decision noise parameters in delay discounting, but by the decision noise parameter in probability discounting. Exploratory decision making was also reflected in choice response times that tracked the degree of conflict caused by both uncertainty and temporal cost, but was not linked with differences in locomotor activity reflecting chamber exploration. Thus, variable discounting behavior in mice can result from changes in exploration of the decision options rather than changes in reward valuation.
Collapse
|
22
|
Bari BA, Moerke MJ, Jedema HP, Effinger DP, Cohen JY, Bradberry CW. Reinforcement learning modeling reveals a reward-history-dependent strategy underlying reversal learning in squirrel monkeys. Behav Neurosci 2022; 136:46-60. [PMID: 34570556 PMCID: PMC8863624 DOI: 10.1037/bne0000492] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Insight into psychiatric disease and development of therapeutics relies on behavioral tasks that study similar cognitive constructs in multiple species. The reversal learning task is one popular paradigm that probes flexible behavior, aberrations of which are thought to be important in a number of disease states. Despite widespread use, there is a need for a high-throughput primate model that can bridge the genetic, anatomic, and behavioral gap between rodents and humans. Here, we trained squirrel monkeys, a promising preclinical model, on an image-guided deterministic reversal learning task. We found that squirrel monkeys exhibited two key hallmarks of behavior found in other species: integration of reward history over many trials and a side-specific bias. We adapted a reinforcement learning model and demonstrated that it could simulate squirrel monkey-like behavior, capture training-related trajectories, and provide insight into the strategies animals employed. These results validate squirrel monkeys as a model in which to study behavioral flexibility. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Bilal A. Bari
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD
| | - Megan J. Moerke
- NIDA Intramural Research Program, 251 Bayview Blvd, Suite 200, Baltimore, MD 21224, USA
| | - Hank P. Jedema
- NIDA Intramural Research Program, 251 Bayview Blvd, Suite 200, Baltimore, MD 21224, USA
| | - Devin P. Effinger
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Jeremiah Y. Cohen
- The Solomon H. Snyder Department of Neuroscience, Brain Science Institute, Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD
| | - Charles W. Bradberry
- NIDA Intramural Research Program, 251 Bayview Blvd, Suite 200, Baltimore, MD 21224, USA
| |
Collapse
|
23
|
Ceceli AO, Bradberry CW, Goldstein RZ. The neurobiology of drug addiction: cross-species insights into the dysfunction and recovery of the prefrontal cortex. Neuropsychopharmacology 2022; 47:276-291. [PMID: 34408275 PMCID: PMC8617203 DOI: 10.1038/s41386-021-01153-9] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/02/2021] [Accepted: 08/06/2021] [Indexed: 01/03/2023]
Abstract
A growing preclinical and clinical body of work on the effects of chronic drug use and drug addiction has extended the scope of inquiry from the putative reward-related subcortical mechanisms to higher-order executive functions as regulated by the prefrontal cortex. Here we review the neuroimaging evidence in humans and non-human primates to demonstrate the involvement of the prefrontal cortex in emotional, cognitive, and behavioral alterations in drug addiction, with particular attention to the impaired response inhibition and salience attribution (iRISA) framework. In support of iRISA, functional and structural neuroimaging studies document a role for the prefrontal cortex in assigning excessive salience to drug over non-drug-related processes with concomitant lapses in self-control, and deficits in reward-related decision-making and insight into illness. Importantly, converging insights from human and non-human primate studies suggest a causal relationship between drug addiction and prefrontal insult, indicating that chronic drug use causes the prefrontal cortex damage that underlies iRISA while changes with abstinence and recovery with treatment suggest plasticity of these same brain regions and functions. We further dissect the overlapping and distinct characteristics of drug classes, potential biomarkers that inform vulnerability and resilience, and advancements in cutting-edge psychological and neuromodulatory treatment strategies, providing a comprehensive landscape of the human and non-human primate drug addiction literature as it relates to the prefrontal cortex.
Collapse
Affiliation(s)
- Ahmet O Ceceli
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Rita Z Goldstein
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
24
|
Chen CS, Knep E, Han A, Ebitz RB, Grissom N. Sex differences in learning from exploration. eLife 2021; 10:69748. [PMID: 34796870 PMCID: PMC8794469 DOI: 10.7554/elife.69748] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 11/18/2021] [Indexed: 11/13/2022] Open
Abstract
Sex-based modulation of cognitive processes could set the stage for individual differences in vulnerability to neuropsychiatric disorders. While value-based decision making processes in particular have been proposed to be influenced by sex differences, the overall correct performance in decision making tasks often show variable or minimal differences across sexes. Computational tools allow us to uncover latent variables that define different decision making approaches, even in animals with similar correct performance. Here, we quantify sex differences in mice in the latent variables underlying behavior in a classic value-based decision making task: a restless 2-armed bandit. While male and female mice had similar accuracy, they achieved this performance via different patterns of exploration. Male mice tended to make more exploratory choices overall, largely because they appeared to get 'stuck' in exploration once they had started. Female mice tended to explore less but learned more quickly during exploration. Together, these results suggest that sex exerts stronger influences on decision making during periods of learning and exploration than during stable choices. Exploration during decision making is altered in people diagnosed with addictions, depression, and neurodevelopmental disabilities, pinpointing the neural mechanisms of exploration as a highly translational avenue for conferring sex-modulated vulnerability to neuropsychiatric diagnoses.
Collapse
Affiliation(s)
- Cathy S Chen
- University of Minnesota, Minneapolis, United States
| | - Evan Knep
- University of Minnesota, Minneapolis, United States
| | - Autumn Han
- University of Minnesota, Minneapolis, United States
| | - R Becket Ebitz
- Department of Neurosciences, Princeton University, Princeton, United States
| | | |
Collapse
|
25
|
Ebitz RB, Hayden BY. The population doctrine in cognitive neuroscience. Neuron 2021; 109:3055-3068. [PMID: 34416170 PMCID: PMC8725976 DOI: 10.1016/j.neuron.2021.07.011] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/02/2021] [Accepted: 07/13/2021] [Indexed: 01/08/2023]
Abstract
A major shift is happening within neurophysiology: a population doctrine is drawing level with the single-neuron doctrine that has long dominated the field. Population-level ideas have so far had their greatest impact in motor neuroscience, but they hold great promise for resolving open questions in cognition as well. Here, we codify the population doctrine and survey recent work that leverages this view to specifically probe cognition. Our discussion is organized around five core concepts that provide a foundation for population-level thinking: (1) state spaces, (2) manifolds, (3) coding dimensions, (4) subspaces, and (5) dynamics. The work we review illustrates the progress and promise that population-level thinking holds for cognitive neuroscience-for delivering new insight into attention, working memory, decision-making, executive function, learning, and reward processing.
Collapse
Affiliation(s)
- R Becket Ebitz
- Department of Neurosciences, Faculté de médecine, Université de Montréal, Montréal, QC, Canada.
| | - Benjamin Y Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
26
|
Barceló F. A Predictive Processing Account of Card Sorting: Fast Proactive and Reactive Frontoparietal Cortical Dynamics during Inference and Learning of Perceptual Categories. J Cogn Neurosci 2021; 33:1636-1656. [PMID: 34375413 DOI: 10.1162/jocn_a_01662] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
For decades, a common assumption in cognitive neuroscience has been that prefrontal executive control is mainly engaged during target detection [Posner, M. I., & Petersen, S. E. The attention system of the human brain. Annual Review of Neuroscience, 13, 25-42, 1990]. More recently, predictive processing theories of frontal function under the Bayesian brain hypothesis emphasize a key role of proactive control for anticipatory action selection (i.e., planning as active inference). Here, we review evidence of fast and widespread EEG and magnetoencephalographic fronto-temporo-parietal cortical activations elicited by feedback cues and target cards in the Wisconsin Card Sorting Test. This evidence is best interpreted when considering negative and positive feedback as predictive cues (i.e., sensory outcomes) for proactively updating beliefs about unknown perceptual categories. Such predictive cues inform posterior beliefs about high-level hidden categories governing subsequent response selection at target onset. Quite remarkably, these new views concur with Don Stuss' early findings concerning two broad classes of P300 cortical responses evoked by feedback cues and target cards in a computerized Wisconsin Card Sorting Test analogue. Stuss' discussion of those P300 responses-in terms of the resolution of uncertainty about response (policy) selection as well as the participants' expectancies for future perceptual or motor activities and their timing-was prescient of current predictive processing and active (Bayesian) inference theories. From these new premises, a domain-general frontoparietal cortical network is rapidly engaged during two temporarily distinct stages of inference and learning of perceptual categories that underwrite goal-directed card sorting behavior, and they each engage prefrontal executive functions in fundamentally distinct ways.
Collapse
|
27
|
Positional inference in rhesus macaques. Anim Cogn 2021; 25:73-93. [PMID: 34302565 DOI: 10.1007/s10071-021-01536-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 07/12/2021] [Accepted: 07/19/2021] [Indexed: 10/20/2022]
Abstract
Understanding how organisms make transitive inferences is critical to understanding their general ability to learn serial relationships. In this context, transitive inference (TI) can be understood as a specific heuristic that applies broadly to many different serial learning tasks, which have been the focus of hundreds of studies involving dozens of species. In the present study, monkeys learned the order of 7-item lists of photographic stimuli by trial and error, and were then tested on "derived" lists. These derived test lists combined stimuli from multiple training lists in ambiguous ways, sometimes changing their order relative to training. We found that subjects displayed strong preferences when presented with novel test pairs, even when those pairs were drawn from different training lists. These preferences were helpful when test pairs had an ordering congruent with their ranks during training, but yielded consistently below-chance performance when pairs had an incongruent order relative to training. This behavior can be explained by the joint contributions of transitive inference and another heuristic that we refer to as "positional inference." Positional inferences play a complementary role to transitive inferences in facilitating choices between novel pairs of stimuli. The theoretical framework that best explains both transitive and positional inferences is a spatial model that represents both the position of each stimulus and its uncertainty. A computational implementation of this framework yields accurate predictions about both correct responses and errors on derived lists.
Collapse
|
28
|
Wilson RC, Bonawitz E, Costa VD, Ebitz RB. Balancing exploration and exploitation with information and randomization. Curr Opin Behav Sci 2021; 38:49-56. [PMID: 33184605 PMCID: PMC7654823 DOI: 10.1016/j.cobeha.2020.10.001] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information ('directed exploration') and the randomization of choice ('random exploration'). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
Collapse
Affiliation(s)
- Robert C. Wilson
- Department of Psychology, University of Arizona, Tucson AZ USA
- Cognitive Science Program, University of Arizona, Tucson AZ USA
- Evelyn F. McKnight Brain Institute, University of Arizona, Tucson AZ USA
| | | | - Vincent D. Costa
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland OR USA
| | - R. Becket Ebitz
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| |
Collapse
|
29
|
Pisupati S, Chartarifsky-Lynn L, Khanal A, Churchland AK. Lapses in perceptual decisions reflect exploration. eLife 2021; 10:55490. [PMID: 33427198 PMCID: PMC7846276 DOI: 10.7554/elife.55490] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Accepted: 01/10/2021] [Indexed: 12/17/2022] Open
Abstract
Perceptual decision-makers often display a constant rate of errors independent of evidence strength. These ‘lapses’ are treated as a nuisance arising from noise tangential to the decision, e.g. inattention or motor errors. Here, we use a multisensory decision task in rats to demonstrate that these explanations cannot account for lapses’ stimulus dependence. We propose a novel explanation: lapses reflect a strategic trade-off between exploiting known rewarding actions and exploring uncertain ones. We tested this model’s predictions by selectively manipulating one action’s reward magnitude or probability. As uniquely predicted by this model, changes were restricted to lapses associated with that action. Finally, we show that lapses are a powerful tool for assigning decision-related computations to neural structures based on disruption experiments (here, posterior striatum and secondary motor cortex). These results suggest that lapses reflect an integral component of decision-making and are informative about action values in normal and disrupted brain states.
Collapse
Affiliation(s)
- Sashank Pisupati
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States.,CSHL School of Biological Sciences, Cold Spring Harbor, New York, United States
| | - Lital Chartarifsky-Lynn
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States.,CSHL School of Biological Sciences, Cold Spring Harbor, New York, United States
| | - Anup Khanal
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States
| | | |
Collapse
|
30
|
Groman SM, Hillmer AT, Heather L, Fowles K, Holden D, Morris ED, Lee D, Taylor JR. Dysregulation of Decision Making Related to Metabotropic Glutamate 5, but Not Midbrain D 3, Receptor Availability Following Cocaine Self-administration in Rats. Biol Psychiatry 2020; 88:777-787. [PMID: 32826065 PMCID: PMC8935943 DOI: 10.1016/j.biopsych.2020.06.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 06/05/2020] [Accepted: 06/19/2020] [Indexed: 12/17/2022]
Abstract
BACKGROUND Compulsive patterns of drug use are thought to be the consequence of drug-induced adaptations in the neural mechanisms that enable behavior to be flexible. Neuroimaging studies have found evidence of robust alterations in glutamate and dopamine receptors within brain regions that are known to be critical for decision-making processes in cocaine-dependent individuals, and these changes have been argued to be the consequence of persistent drug use. The causal relationships among drug-induced alterations, cocaine taking, and maladaptive decision-making processes, however, are difficult to establish in humans. METHODS We assessed decision making in adult male rats using a probabilistic reversal learning task and used positron emission tomography with the [11C]-(+)-PHNO and [18F]FPEB radioligands to quantify regional dopamine D2/3 and metabotropic glutamate 5 (mGlu5) receptor availability, respectively, before and after 21 days of cocaine or saline self-administration. Tests of motivation and relapse-like behaviors were also conducted. RESULTS We found that self-administration of cocaine, but not of saline, disrupted behavior in the probabilistic reversal learning task measured by selective impairments in negative-outcome updating and also increased cortical mGlu5 receptor availability following 2 weeks of forced abstinence. D2/3 and, importantly, midbrain D3 receptor availability was not altered following 2 weeks of abstinence from cocaine. Notably, the degree of the cocaine-induced increase in cortical mGlu5 receptor availability was related to the degree of disruption in negative-outcome updating. CONCLUSIONS These findings suggest that cocaine-induced changes in mGlu5 signaling may be a mechanism by which disruptions in negative-outcome updating emerge in cocaine-dependent individuals.
Collapse
Affiliation(s)
- Stephanie M. Groman
- Department of Psychiatry Yale University,Correspondence should be addressed to: Stephanie M. Groman, Ph.D. (), Jane R. Taylor, Ph.D. (), 34 Park Street, New Haven CT 06515
| | - Ansel T. Hillmer
- Department of Psychiatry Yale University,Department of Radiology and Biomedical Imaging Yale University,Department of Yale Positron Emission Tomography Center Yale University
| | - Liu Heather
- Department of Radiology and Biomedical Imaging Yale University
| | - Krista Fowles
- Department of Yale Positron Emission Tomography Center Yale University
| | - Daniel Holden
- Department of Yale Positron Emission Tomography Center Yale University
| | - Evan D. Morris
- Department of Radiology and Biomedical Imaging Yale University,Department of Yale Positron Emission Tomography Center Yale University,Invicro LLC
| | - Daeyeol Lee
- The Zanvyl Krieger Mind/Brain Institute, The Solomon H Snyder Department of Neuroscience, Department of Psychological and Brain Sciences, Johns Hopkins University
| | - Jane R. Taylor
- Department of Psychiatry Yale University,Department of Neuroscience Yale University,Correspondence should be addressed to: Stephanie M. Groman, Ph.D. (), Jane R. Taylor, Ph.D. (), 34 Park Street, New Haven CT 06515
| |
Collapse
|
31
|
Ebitz RB, Tu JC, Hayden BY. Rules warp feature encoding in decision-making circuits. PLoS Biol 2020; 18:e3000951. [PMID: 33253163 PMCID: PMC7728226 DOI: 10.1371/journal.pbio.3000951] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 12/10/2020] [Accepted: 11/02/2020] [Indexed: 01/22/2023] Open
Abstract
We have the capacity to follow arbitrary stimulus-response rules, meaning simple policies that guide our behavior. Rule identity is broadly encoded across decision-making circuits, but there are less data on how rules shape the computations that lead to choices. One idea is that rules could simplify these computations. When we follow a rule, there is no need to encode or compute information that is irrelevant to the current rule, which could reduce the metabolic or energetic demands of decision-making. However, it is not clear if the brain can actually take advantage of this computational simplicity. To test this idea, we recorded from neurons in 3 regions linked to decision-making, the orbitofrontal cortex (OFC), ventral striatum (VS), and dorsal striatum (DS), while macaques performed a rule-based decision-making task. Rule-based decisions were identified via modeling rules as the latent causes of decisions. This left us with a set of physically identical choices that maximized reward and information, but could not be explained by simple stimulus-response rules. Contrasting rule-based choices with these residual choices revealed that following rules (1) decreased the energetic cost of decision-making; and (2) expanded rule-relevant coding dimensions and compressed rule-irrelevant ones. Together, these results suggest that we use rules, in part, because they reduce the costs of decision-making through a distributed representational warping in decision-making circuits.
Collapse
Affiliation(s)
- R. Becket Ebitz
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Jiaxin Cindy Tu
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Benjamin Y. Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, and Center for Neuroengineering University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
32
|
Cash-Padgett T, Hayden B. Behavioural variability contributes to over-staying in patchy foraging. Biol Lett 2020; 16:20190915. [PMID: 32156171 DOI: 10.1098/rsbl.2019.0915] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Foragers often systematically deviate from rate-maximizing choices in two ways: accuracy and precision. That is, they use suboptimal threshold values and also show variability in their application of those thresholds. We hypothesized that these biases are related and, more specifically, that foragers' widely known accuracy bias--over-staying--could be explained, at least in part, by their imprecision. To test this hypothesis, we analysed choices made by three rhesus macaques in a computerized patch foraging task. Confirming previously observed findings, we found high levels of variability. We then showed, through simulations, that this variability changed optimal thresholds, meaning that a forager aware of its own variability should increase its leaving threshold (i.e. over-stay) to increase performance. All subjects showed thresholds that were biased in the predicted direction. These results indicate that over-staying in patches may reflect, in part, an adaptation to behavioural variability.
Collapse
Affiliation(s)
- Tyler Cash-Padgett
- Department of Neuroscience, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455, USA.,Center for Neuroengineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Benjamin Hayden
- Department of Neuroscience, Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455, USA.,Center for Neuroengineering, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|