Hyperbolically discounted temporal difference learning
- PMID: 20100071
- PMCID: PMC3005720
- DOI: 10.1162/neco.2010.08-09-1080
Hyperbolically discounted temporal difference learning
Abstract
Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of temporal discounting, such as temporal difference learning, assume that future outcomes are discounted exponentially. Exponential discounting has been preferred largely because it can be expressed recursively, whereas hyperbolic discounting has heretofore been thought not to have a recursive definition. In this letter, we define a learning algorithm, hyperbolically discounted temporal difference (HDTD) learning, which constitutes a recursive formulation of the hyperbolic model.
Figures
References
-
- Amador N, Schlag-Rey M, et al. Reward-predicting and reward-detecting neuronal activity in the primate supplementary eye field. J Neurophysiol. 2000;84(4):2166–2170. - PubMed
-
- Brunner D. Preference for sequences of rewards: further tests of a parallel discounting model. Behavioural Processes. 1999;45(1–3):87–99. - PubMed
-
- Daw ND, Touretzky DS. Behavioral considerations suggest an average reward TD model of the dopamine system. Neurocomputing: An International Journal. 2000;32–33:679–684.
-
- Daw ND, Touretzky DS. Long-term reward prediction in TD models of the dopamine system. Neural Comput. 2002;14(11):2567–2583. - PubMed
-
- Green L, Myerson J. Exponential Versus Hyperbolic Discounting of Delayed Outcomes: Risk and Waiting Time. Amer. Zool. 1996;36(4):496–505.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
