Learning the opportunity cost of time in a patch-foraging task

Sara M Constantino¹, Nathaniel D Daw^{2

3}

Affiliations

¹ Department of Psychology, New York University, 8th floor, 6 Washington Place, New York, NY, 10003, USA. sara.constantino@gmail.com.
² Department of Psychology, New York University, 8th floor, 6 Washington Place, New York, NY, 10003, USA.
³ Center for Neural Science, New York University, New York, NY, USA.

PMID: 25917000
PMCID: PMC4624618
DOI: 10.3758/s13415-015-0350-y

Learning the opportunity cost of time in a patch-foraging task

Sara M Constantino et al. Cogn Affect Behav Neurosci. 2015 Dec.

. 2015 Dec;15(4):837-53.

doi: 10.3758/s13415-015-0350-y.

Authors

Sara M Constantino¹, Nathaniel D Daw^{2

3}

Affiliations

¹ Department of Psychology, New York University, 8th floor, 6 Washington Place, New York, NY, 10003, USA. sara.constantino@gmail.com.
² Department of Psychology, New York University, 8th floor, 6 Washington Place, New York, NY, 10003, USA.
³ Center for Neural Science, New York University, New York, NY, USA.

PMID: 25917000
PMCID: PMC4624618
DOI: 10.3758/s13415-015-0350-y

Abstract

Although most decision research concerns choice between simultaneously presented options, in many situations options are encountered serially, and the decision is whether to exploit an option or search for a better one. Such problems have a rich history in animal foraging, but we know little about the psychological processes involved. In particular, it is unknown whether learning in these problems is supported by the well-studied neurocomputational mechanisms involved in more conventional tasks. We investigated how humans learn in a foraging task, which requires deciding whether to harvest a depleting resource or switch to a replenished one. The optimal choice (given by the marginal value theorem; MVT) requires comparing the immediate return from harvesting to the opportunity cost of time, which is given by the long-run average reward. In two experiments, we varied opportunity cost across blocks, and subjects adjusted their behavior to blockwise changes in environmental characteristics. We examined how subjects learned their choice strategies by comparing choice adjustments to a learning rule suggested by the MVT (in which the opportunity cost threshold is estimated as an average over previous rewards) and to the predominant incremental-learning theory in neuroscience, temporal-difference learning (TD). Trial-by-trial decisions were explained better by the MVT threshold-learning rule. These findings expand on the foraging literature, which has focused on steady-state behavior, by elucidating a computational mechanism for learning in switching tasks that is distinct from those used in traditional tasks, and suggest connections to research on average reward rates in other domains of neuroscience.

Keywords: Computational model; Decision making; Dopamine; Patch foraging; Reinforcement learning; Reward.

PubMed Disclaimer

Figures

**Figure 1. Task Display**
Subjects foraged for apples in four 14-minute virtual patch-foraging environments. They were presented with a tree and had to decide whether to harvest it for apples and incur a short harvest delay or move to a new tree and incur a longer travel delay. Harvests at a tree earn apples, albeit at an exponentially decelerating rate. New trees are drawn from a Gaussian distribution. Environmental richness or opportunity cost of time was varied across blocks by changing the travel time and/or the apple depletion rate. The quality of the tree, depletion rate and richness of the environment are a priori unknown to the subject (see *Methods* for a detailed explanation).

**Figure 2. Foraging behavior**
Behavioral results compared to the optimal (ideal observer) performance in the task. (Top) Experiment 1a (travel time; L=long, S=short) and 1b (depletion rate; t=steep, h=shallow), (Bottom) Experiment 2 (Lh=long-shallow, Lt=long-steep, Sh=short-shallow, St=short-steep). (a,d) Example subject tree-by-tree exit points over time in the experiment. Colors indicate environments and grey lines indicate the optimal exit threshold. (b,e) Group performance by block. Height of the grey bars indicates optimal thresholds. Open circles connected by grey lines are individual subject mean exit thresholds and adjustment across environments. Filled diamonds are the mean exit threshold with 95% confidence intervals. (c,f) Colored curves show the achievable average reward per period for any given threshold policy in the different environments. Pluses are individual subjects’ mean exit thresholds. The dashed line is the MVT rule – points where the average reward rate is equal to the expected reward; it intersects the colored curves at the optimal exit thresholds.

**Figure 3. Model comparison**
Approximate log Bayes factors (difference in BIC scores) favoring MVT vs TD learning models, shown for each subject separately for Experiments 1A and 1B (left) and 2 (right).

See this image and copyright information in PMC

References

1. Aston-Jones G, Cohen JD. An Integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annual Review of Neuroscience. 2005;28:403–450. - PubMed
1. Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience. 2004;7(4):404–10. - PubMed
1. Baum WM. Choice in free-ranging wild pigeons. Science. 1974;185(4145):78–79. - PubMed
1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nature Neuroscience. 2007;10(9):1214–1221. - PubMed
1. Beierholm U, Guitart-Masip M, Economides M, Chowdhury R, Düzel E, Dolan R, Dayan P. Dopamine modulates reward-related vigor. Neuropsychopharmacology. 2013;38(8):1495–503. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Learning the opportunity cost of time in a patch-foraging task

Affiliations

Learning the opportunity cost of time in a patch-foraging task

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources