1
|
Pribut HJ, Kang N, Roesch MR. Prior cocaine self-administration does not impair the ability to delay gratification in rats during diminishing returns. Behav Pharmacol 2024; 35:147-155. [PMID: 38651979 DOI: 10.1097/fbp.0000000000000771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Previous exposure to drugs of abuse produces impairments in studies of reversal learning, delay discounting and response inhibition tasks. While these studies contribute to the understanding of normal decision-making and how it is impaired by drugs of abuse, they do not fully capture how decision-making impacts the ability to delay gratification for greater long-term benefit. To address this issue, we used a diminishing returns task to study decision-making in rats that had previously self-administered cocaine. This task was designed to test the ability of the rat to choose to delay gratification in the short-term to obtain more reward over the course of the entire behavioral session. Rats were presented with two choices. One choice had a fixed amount of time delay needed to obtain reward [i.e. fixed delay (FD)], while the other choice had a progressive delay (PD) that started at 0 s and progressively increased by 1 s each time the PD option was selected. During the 'reset' variation of the task, rats could choose the FD option to reset the time delay associated with the PD option. Consistent with previous results, we found that prior cocaine exposure reduced rats' overall preference for the PD option in post-task reversal testing during 'no-reset' sessions, suggesting that cocaine exposure made rats more sensitive to the increasing delay of the PD option. Surprisingly, however, we found that rats that had self-administered cocaine 1-month prior, adapted behavior during 'reset' sessions by delaying gratification to obtain more reward in the long run similar to control rats.
Collapse
Affiliation(s)
- H J Pribut
- Department of Psychology
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, Maryland, USA
| | | | - Matthew R Roesch
- Department of Psychology
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
2
|
Schuweiler DR, Rao M, Pribut HJ, Roesch MR. Rats delay gratification during a time-based diminishing returns task. JOURNAL OF EXPERIMENTAL PSYCHOLOGY. ANIMAL LEARNING AND COGNITION 2021; 47:420-428. [PMID: 34472950 PMCID: PMC8639657 DOI: 10.1037/xan0000305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The rat is a common animal model used to uncover the neural underpinnings of decision making and their disruption in psychiatric illness. Here, we ask if rats can perform a decision-making task that assesses self-control by delayed gratification in the context of diminishing returns. In this task, rats could choose to press one of two levers. One lever was associated with a fixed delay (FD) schedule that delivered reward after a fixed time delay (10 s). The other lever was associated with a progressive delay (PD) schedule; the delay increased by a fixed amount of time (1 s) after each PD lever press. Rats were tested under two conditions: a reset condition where rats could reset the PD schedule back to its initial 0-s delay by pressing the FD lever and a no-reset condition in which resetting the PD schedule was unavailable. We found that rats adapted behavior within reset sessions by delaying gratification to obtain more reward in the long run. That is, they selected the FD lever with the longer delay to reset the PD delay back to zero prior to the equality point, thus achieving more reward over the course of the session. These results are consistent with other species, demonstrating that rats can also maximize the net rate of reward by selecting an option that is not immediately beneficial. Moreover, use of this task in rodents might provide insights into how the brain governs normal and abnormal behavior, as well as treatments that can improve self-control. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
3
|
Biased belief updating and suboptimal choice in foraging decisions. Nat Commun 2020; 11:3417. [PMID: 32647271 PMCID: PMC7347922 DOI: 10.1038/s41467-020-16964-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 05/27/2020] [Indexed: 11/08/2022] Open
Abstract
Deciding which options to engage, and which to forego, requires developing accurate beliefs about the overall distribution of prospects. Here we adapt a classic prey selection task from foraging theory to examine how individuals keep track of an environment’s reward rate and adjust choices in response to its fluctuations. Preference shifts were most pronounced when the environment improved compared to when it deteriorated. This is best explained by a trial-by-trial learning model in which participants estimate the reward rate with upward vs. downward changes controlled by separate learning rates. A failure to adjust expectations sufficiently when an environment becomes worse leads to suboptimal choices: options that are valuable given the environmental conditions are rejected in the false expectation that better options will materialize. These findings offer a previously unappreciated parallel in the serial choice setting of observations of asymmetric updating and resulting biased (often overoptimistic) estimates in other domains. In some types of decision-making, people must accept or forego an option without knowing what prospects might later be available. Here, the authors reveal how a key bias– asymmetric learning from negative versus positive outcomes – emerges in this type of decision.
Collapse
|
4
|
Pietras CJ, Cherek DR, Lane SD, Tcheremissine O. Risk Reduction and Resource Pooling on a Cooperation Task. PSYCHOLOGICAL RECORD 2017. [DOI: 10.1007/bf03395557] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Learning the opportunity cost of time in a patch-foraging task. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2016; 15:837-53. [PMID: 25917000 DOI: 10.3758/s13415-015-0350-y] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Although most decision research concerns choice between simultaneously presented options, in many situations options are encountered serially, and the decision is whether to exploit an option or search for a better one. Such problems have a rich history in animal foraging, but we know little about the psychological processes involved. In particular, it is unknown whether learning in these problems is supported by the well-studied neurocomputational mechanisms involved in more conventional tasks. We investigated how humans learn in a foraging task, which requires deciding whether to harvest a depleting resource or switch to a replenished one. The optimal choice (given by the marginal value theorem; MVT) requires comparing the immediate return from harvesting to the opportunity cost of time, which is given by the long-run average reward. In two experiments, we varied opportunity cost across blocks, and subjects adjusted their behavior to blockwise changes in environmental characteristics. We examined how subjects learned their choice strategies by comparing choice adjustments to a learning rule suggested by the MVT (in which the opportunity cost threshold is estimated as an average over previous rewards) and to the predominant incremental-learning theory in neuroscience, temporal-difference learning (TD). Trial-by-trial decisions were explained better by the MVT threshold-learning rule. These findings expand on the foraging literature, which has focused on steady-state behavior, by elucidating a computational mechanism for learning in switching tasks that is distinct from those used in traditional tasks, and suggest connections to research on average reward rates in other domains of neuroscience.
Collapse
|
6
|
Fox AE, Pietras CJ. THE EFFECTS OF RESPONSE-COST PUNISHMENT ON INSTRUCTIONAL CONTROL DURING A CHOICE TASK. J Exp Anal Behav 2013; 99:346-61. [DOI: 10.1002/jeab.20] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Accepted: 01/18/2013] [Indexed: 11/11/2022]
|
7
|
Lattal KA, Neef NA. Recent reinforcement-schedule research and applied behavior analysis. J Appl Behav Anal 2010; 29:213-30. [PMID: 16795888 PMCID: PMC1279895 DOI: 10.1901/jaba.1996.29-213] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Reinforcement schedules are considered in relation to applied behavior analysis by examining several recent laboratory experiments with humans and other animals. The experiments are drawn from three areas of contemporary schedule research: behavioral history effects on schedule performance, the role of instructions in schedule performance of humans, and dynamic schedules of reinforcement. All of the experiments are discussed in relation to the role of behavioral history in current schedule performance. The paper concludes by extracting from the experiments some more general issues concerning reinforcement schedules in applied research and practice.
Collapse
|
8
|
Sakai Y, Fukai T. The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors. Neural Comput 2008; 20:227-51. [PMID: 18045007 DOI: 10.1162/neco.2008.20.1.227] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The ability to make a correct choice of behavior from various options is crucial for animals' survival. The neural basis for the choice of behavior has been attracting growing attention in research on biological and artificial neural systems. Alternative choice tasks with variable ratio (VR) and variable interval (VI) schedules of reinforcement have often been employed in studying decision making by animals and humans. In the VR schedule task, alternative choices are reinforced with different probabilities, and subjects learn to select the behavioral response rewarded more frequently. In the VI schedule task, alternative choices are reinforced at different average intervals independent of the choice frequencies, and the choice behavior follows the so-called matching law. The two policies appear robustly in subjects' choice of behavior, but the underlying neural mechanisms remain unknown. Here, we show that these seemingly different policies can appear from a common computational algorithm known as actor-critic learning. We present experimentally testable variations of the VI schedule in which the matching behavior gives only a suboptimal solution to decision making and show that the actor-critic system exhibits the matching behavior in the steady state of the learning even when the matching behavior is suboptimal. However, it is found that the matching behavior can earn approximately the same reward as the optimal one in many practical situations.
Collapse
Affiliation(s)
- Yutaka Sakai
- Department of Intelligent Information Systems, Tamagawa University, Machida, Tokyo 194-8610, Japan
| | - Tomoki Fukai
- Laboratory for Neural Circuit Theory, Brain Science Institute, RIKEN, Wako, Saitama 351-0198, Japan
| |
Collapse
|
9
|
Sakai Y, Okamoto H, Fukai T. Computational algorithms and neuronal network models underlying decision processes. Neural Netw 2006; 19:1091-105. [PMID: 16942856 DOI: 10.1016/j.neunet.2006.05.034] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2005] [Accepted: 05/24/2006] [Indexed: 11/18/2022]
Abstract
Animals or humans often encounter such situations in which they must choose their behavioral responses to be made in the near or distant future. Such a decision is made through continuous and bidirectional interactions between the environment surrounding the brain and its internal state or dynamical processes. Therefore, decision making may provide a unique field of researches for studying information processing by the brain, a biological system open to information exchanges with the external world. To make a decision, the brain must analyze pieces of information given externally, past experiences in a similar situation, possible behavioral responses, and predicted outcomes of the individual responses. In this article, we review results of recent experimental and theoretical studies of neuronal substrates and computational algorithms for decision processes.
Collapse
Affiliation(s)
- Yutaka Sakai
- Department of Intelligent Information Systems, Tamagawa University, Tamagawa Gakeun 6-1-1, Machida, Tokyo, Japan.
| | | | | |
Collapse
|
10
|
Pietras CJ, Locey ML, Hackenberg TD. Human risky choice under temporal constraints: tests of an energy-budget model. J Exp Anal Behav 2004; 80:59-75. [PMID: 13677609 PMCID: PMC1284947 DOI: 10.1901/jeab.2003.80-59] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Risk-sensitive foraging models predict that choice between fixed and variable food delays should be influenced by an organism's energy budget. To investigate whether the predictions of these models could be extended to choice in humans, risk sensitivity in 4 adults was investigated under laboratory conditions designed to model positive and negative energy budgets. Subjects chose between fixed and variable trial durations with the same mean value. An energy requirement was modeled by requiring that five trials be completed within a limited time period for points delivered at the end of the period (block of trials) to be exchanged later for money. Manipulating the duration of this time period generated positive and negative earnings budgets (or, alternatively, "time budgets"). Choices were consistent with the predictions of energy-budget models: The fixed-delay option was strongly preferred under positive earnings-budget conditions and the variable-delay option was strongly preferred under negative earnings-budget conditions. Within-block (or trial-by-trial) choices were also frequently consistent with the predictions of a dynamic optimization model, indicating that choice was simultaneously sensitive to the temporal requirements, delays associated with fixed and variable choices on the upcoming trial, cumulative delays within the block of trials, and trial position within a block.
Collapse
|
11
|
Abstract
Risky choice in 3 adult humans was investigated across procedural manipulations designed to model energy-budget manipulations conducted with nonhumans. Subjects were presented with repeated choices between a fixed and a variable number of points. An energy budget was simulated by use of an earnings budget, defined as the number of points needed within a block of trials for points to be exchanged for money. During positive earnings-budget conditions, exclusive preference for the fixed option met the earnings requirement. During negative earnings-budget conditions, exclusive preference for the certain option did not meet the earnings requirement, but choice for the variable option met the requirement probabilistically. Choice was generally risk averse (the fixed option was preferred) when the earnings budget was positive and risk prone (the variable option was preferred) when the earnings budget was negative. Furthermore, choice was most risk prone during negative earnings-budget conditions in which the earnings requirement was most stringent. Local choice patterns were also frequently consistent with the predictions of a dynamic optimization model, indicating that choice was simultaneously sensitive to short-term choice contingencies, current point earnings, and the earnings requirement. Overall, these results show that the patterns of risky choice generated by energy-budget variables can also be produced by choice contingencies that do not involve immediate survival, and that risky choice in humans may be similar to that shown in nonhumans when choice is studied under analogous experimental conditions.
Collapse
Affiliation(s)
- C J Pietras
- Graduate School at the University of Florida, Gainesville 32611-2250, USA.
| | | |
Collapse
|
12
|
Leinenweber A, Nietzel SM, Baron A. Temporal control by progressive-interval schedules of reinforcement. J Exp Anal Behav 1996; 66:311-26. [PMID: 8921613 PMCID: PMC1284573 DOI: 10.1901/jeab.1996.66-311] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Progressive-interval performances are described using measures that have proven to be successful in the analysis of fixed-interval responding. Five rats were trained with schedules in which the durations of consecutive intervals increased arithmetically as each interval was completed (either 6-s or 12-s steps for different subjects). The response patterns that emerged with extended training (90 sessions) indicated that performances had come under temporal control. Postreinforcement pausing increased as a function of the interval duration, the pauses were proportional to the prevailing duration, and the likelihood of the first response within an interval increased as the interval elapsed. To assess the resistance of these patterns to disruption, subjects were trained with a schedule that generated high response rates and short pauses (variable ratio). When the progressive-interval schedule was reinstated, pausing was attenuated and rates were elevated, but performances reverted to earlier patterns with continued exposure. The results indicated that temporal control by progressive-interval schedules, although slow to develop, is similar in many respects to that for fixed-interval schedules.
Collapse
|