Temporal discounting correlates with directed exploration but not with random exploration

Hashem Sadeghiyeh^{1

2}, Siyu Wang³, Maxwell R Alberhasky⁴, Hannah M Kyllo³, Amitai Shenhav⁵, Robert C Wilson^{3

6}

Affiliations

¹ Department of Psychology, University of Arizona, Tucson, USA. sadeghiyeh@email.arizona.edu.
² Department of Psychological Science, Missouri University of Science and Technology, Rolla, USA. sadeghiyeh@email.arizona.edu.
³ Department of Psychology, University of Arizona, Tucson, USA.
⁴ McCombs School of Business, University of Texas at Austin, Austin, USA.
⁵ Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, USA.
⁶ Cognitive Science Program, University of Arizona, Tucson, USA.

PMID: 32132573
PMCID: PMC7055215
DOI: 10.1038/s41598-020-60576-4

Temporal discounting correlates with directed exploration but not with random exploration

Hashem Sadeghiyeh et al. Sci Rep. 2020.

. 2020 Mar 4;10(1):4020.

doi: 10.1038/s41598-020-60576-4.

Authors

Hashem Sadeghiyeh^{1

2}, Siyu Wang³, Maxwell R Alberhasky⁴, Hannah M Kyllo³, Amitai Shenhav⁵, Robert C Wilson^{3

6}

Affiliations

¹ Department of Psychology, University of Arizona, Tucson, USA. sadeghiyeh@email.arizona.edu.
² Department of Psychological Science, Missouri University of Science and Technology, Rolla, USA. sadeghiyeh@email.arizona.edu.
³ Department of Psychology, University of Arizona, Tucson, USA.
⁴ McCombs School of Business, University of Texas at Austin, Austin, USA.
⁵ Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, USA.
⁶ Cognitive Science Program, University of Arizona, Tucson, USA.

PMID: 32132573
PMCID: PMC7055215
DOI: 10.1038/s41598-020-60576-4

Abstract

The explore-exploit dilemma describes the trade off that occurs any time we must choose between exploring unknown options and exploiting options we know well. Implicit in this trade off is how we value future rewards - exploiting is usually better in the short term, but in the longer term the benefits of exploration can be huge. Thus, in theory there should be a tight connection between how much people value future rewards, i.e. how much they discount future rewards relative to immediate rewards, and how likely they are to explore, with less 'temporal discounting' associated with more exploration. By measuring individual differences in temporal discounting and correlating them with explore-exploit behavior, we tested whether this theoretical prediction holds in practice. We used the 27-item Delay-Discounting Questionnaire to estimate temporal discounting and the Horizon Task to quantify two strategies of explore-exploit behavior: directed exploration, where information drives exploration by choice, and random exploration, where behavioral variability drives exploration by chance. We find a clear correlation between temporal discounting and directed exploration, with more temporal discounting leading to less directed exploration. Conversely, we find no relationship between temporal discounting and random exploration. Unexpectedly, we find that the relationship with directed exploration appears to be driven by a correlation between temporal discounting and uncertainty seeking at short time horizons, rather than information seeking at long horizons. Taken together our results suggest a nuanced relationship between temporal discounting and explore-exploit behavior that may be mediated by multiple factors.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
(A)Horizon task: the four forced trials set up one of two information conditions (unequal [1 3] and equal [2 2] information) and two horizon conditions (1 vs 6) before participants make their first free choice. (B) The sequence of trials in the horizon task.

**Figure 2**
Histograms demonstrating the distribution of Horizon Task parameters (p(high info) and p(low mean) in horizon 1 & 6 and directed & random explorations) in our sample of 82 participants. The y-axis is the frequency or the number of occurrences per each value on the x-axis.

**Figure 3**
The average of p(high info) (A) and p(low mean) (B) for 82 participants on each horizon condition. The increase in p(high info) and p(low mean) from horizon 1 to horizon 6 follows the typical pattern observed in our previous studies and shows the use of both directed and random exploration.

**Figure 4**
Scatter plots comparing task parameters (A) p(high info) and (B) p(low mean) for individual participants in horizon 1 and horizon 6. The dashed lines show equality. Those cases above this line denotes the expected horizon behavior (where p(high info) h6 > p(high info) h1 and p(low mean) h6 > p(low mean) h1.

**Figure 5**
Histograms demonstrating the distribution of temporal discounting measures in our sample of 82 participants. The y-axis is the frequency or the number of occurrences per each value on the x-axis.

**Figure 6**
Scatter plots for (A) p(high info) h1, (B) p(low mean) h1, (C) p(high info) h6, (D) p(low mean) h6, (E) directed exploration and (F) random exploration over a temporal discounting measure (# today items). It clearly shows that the negative correlation between temporal discounting and directed exploration is driven by a positive correlation between temporal discounting and p(high info) h1.

See this image and copyright information in PMC

References

1. Sutton, R. S. and Barto, A. G. Reinforcement learning : an introduction (MIT press, 1998).
1. Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. doi: 10.1038/nature04766. - DOI - PMC - PubMed
1. Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD. Humans use directed and random exploration to solve the explore-exploit dilemma. Journal of experimental psychology. General. 2014;143:2074–81. doi: 10.1037/a0038199. - DOI - PMC - PubMed
1. Gershman SJ. Deconstructing the human algorithms for exploration. Cognition. 2018;173:34–42. doi: 10.1016/j.cognition.2017.12.014. - DOI - PMC - PubMed
1. Frank MJ, Doll BB, Oas-Terpstra J, Moreno F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience. 2009;12:1062–1068. doi: 10.1038/nn.2342. - DOI - PMC - PubMed

Grants and funding

P20 GM103645/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Temporal discounting correlates with directed exploration but not with random exploration

Affiliations

Temporal discounting correlates with directed exploration but not with random exploration

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources