Review

. 2017 Jan 3:68:101-128.

doi: 10.1146/annurev-psych-122414-033625. Epub 2016 Sep 2.

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Samuel J Gershman¹, Nathaniel D Daw²

Affiliations

¹ Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138; email: gershman@fas.harvard.edu.
² Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, New Jersey 08544.

PMID: 27618944
PMCID: PMC5953519
DOI: 10.1146/annurev-psych-122414-033625

Review

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Samuel J Gershman et al. Annu Rev Psychol. 2017.

. 2017 Jan 3:68:101-128.

doi: 10.1146/annurev-psych-122414-033625. Epub 2016 Sep 2.

Authors

Samuel J Gershman¹, Nathaniel D Daw²

Affiliations

¹ Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138; email: gershman@fas.harvard.edu.
² Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, New Jersey 08544.

PMID: 27618944
PMCID: PMC5953519
DOI: 10.1146/annurev-psych-122414-033625

Abstract

We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.

Keywords: decision making; memory; reinforcement learning.

PubMed Disclaimer

Figures

**Figure 1. Schematic of different approaches to value computation**
(A) In model-free reinforcement learning, individual experiences are integrated into a cached value, which is then used to compute action values in a new state. Only cached values are stored in memory; individual experiences are discarded. Green triangle indicates the agent’s state, red crosses indicate rewards, and blue arrows indicate paths through the state space. (B) In episodic reinforcement learning, individual experiences, along with their associated returns, are retained in memory and retrieved at choice time. Each episodic trace is weighted by its similarity to the current state according to a kernel function. This kernel-weighted average implements a nonparametric value estimate.

**Figure 2. Comparison of the successor representation in different environments**
Each graph shows the successor representation for the state indicated by the green triangle. The rewarded state is indicated by a red cross. (Left) An open field. (Right) Field with a barrier, indicated by the blue line. The top row shows the successor representation for an undirected or “random” walk induced by a policy that moves through the state space randomly. The bottom row shows the results for a directed policy that moves deterministically along the shortest path to the reward.

See this image and copyright information in PMC

References

1. Adams CD. Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology. 1982;34:77–98.
1. Alexander GE, Crutcher MD. Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends in Neurosciences. 1990;13:266–71. - PubMed
1. Barron G, Erev I. Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making. 2003;16:215–233.
1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–41. - PMC - PubMed
1. Bellman R. Dynamic Programming. Princeton University Press; 1957.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 DA038891/DA/NIDA NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Affiliations

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous