Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec;15(4):837-53.
doi: 10.3758/s13415-015-0350-y.

Learning the opportunity cost of time in a patch-foraging task

Affiliations

Learning the opportunity cost of time in a patch-foraging task

Sara M Constantino et al. Cogn Affect Behav Neurosci. 2015 Dec.

Abstract

Although most decision research concerns choice between simultaneously presented options, in many situations options are encountered serially, and the decision is whether to exploit an option or search for a better one. Such problems have a rich history in animal foraging, but we know little about the psychological processes involved. In particular, it is unknown whether learning in these problems is supported by the well-studied neurocomputational mechanisms involved in more conventional tasks. We investigated how humans learn in a foraging task, which requires deciding whether to harvest a depleting resource or switch to a replenished one. The optimal choice (given by the marginal value theorem; MVT) requires comparing the immediate return from harvesting to the opportunity cost of time, which is given by the long-run average reward. In two experiments, we varied opportunity cost across blocks, and subjects adjusted their behavior to blockwise changes in environmental characteristics. We examined how subjects learned their choice strategies by comparing choice adjustments to a learning rule suggested by the MVT (in which the opportunity cost threshold is estimated as an average over previous rewards) and to the predominant incremental-learning theory in neuroscience, temporal-difference learning (TD). Trial-by-trial decisions were explained better by the MVT threshold-learning rule. These findings expand on the foraging literature, which has focused on steady-state behavior, by elucidating a computational mechanism for learning in switching tasks that is distinct from those used in traditional tasks, and suggest connections to research on average reward rates in other domains of neuroscience.

Keywords: Computational model; Decision making; Dopamine; Patch foraging; Reinforcement learning; Reward.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Task Display
Subjects foraged for apples in four 14-minute virtual patch-foraging environments. They were presented with a tree and had to decide whether to harvest it for apples and incur a short harvest delay or move to a new tree and incur a longer travel delay. Harvests at a tree earn apples, albeit at an exponentially decelerating rate. New trees are drawn from a Gaussian distribution. Environmental richness or opportunity cost of time was varied across blocks by changing the travel time and/or the apple depletion rate. The quality of the tree, depletion rate and richness of the environment are a priori unknown to the subject (see Methods for a detailed explanation).
Figure 2
Figure 2. Foraging behavior
Behavioral results compared to the optimal (ideal observer) performance in the task. (Top) Experiment 1a (travel time; L=long, S=short) and 1b (depletion rate; t=steep, h=shallow), (Bottom) Experiment 2 (Lh=long-shallow, Lt=long-steep, Sh=short-shallow, St=short-steep). (a,d) Example subject tree-by-tree exit points over time in the experiment. Colors indicate environments and grey lines indicate the optimal exit threshold. (b,e) Group performance by block. Height of the grey bars indicates optimal thresholds. Open circles connected by grey lines are individual subject mean exit thresholds and adjustment across environments. Filled diamonds are the mean exit threshold with 95% confidence intervals. (c,f) Colored curves show the achievable average reward per period for any given threshold policy in the different environments. Pluses are individual subjects’ mean exit thresholds. The dashed line is the MVT rule – points where the average reward rate is equal to the expected reward; it intersects the colored curves at the optimal exit thresholds.
Figure 3
Figure 3. Model comparison
Approximate log Bayes factors (difference in BIC scores) favoring MVT vs TD learning models, shown for each subject separately for Experiments 1A and 1B (left) and 2 (right).

References

    1. Aston-Jones G, Cohen JD. An Integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annual Review of Neuroscience. 2005;28:403–450. - PubMed
    1. Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience. 2004;7(4):404–10. - PubMed
    1. Baum WM. Choice in free-ranging wild pigeons. Science. 1974;185(4145):78–79. - PubMed
    1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10(9):1214–1221. - PubMed
    1. Beierholm U, Guitart-Masip M, Economides M, Chowdhury R, Düzel E, Dolan R, Dayan P. Dopamine modulates reward-related vigor. Neuropsychopharmacology. 2013;38(8):1495–503. - PMC - PubMed

Publication types

LinkOut - more resources