Navigating complex decision spaces: Problems and paradigms in sequential choice

Matthew M Walsh¹, John R Anderson²

Affiliations

PMID: 23834192
PMCID: PMC4309984
DOI: 10.1037/a0033455

Review

Navigating complex decision spaces: Problems and paradigms in sequential choice

Matthew M Walsh et al. Psychol Bull. 2014 Mar.

. 2014 Mar;140(2):466-86.

doi: 10.1037/a0033455. Epub 2013 Jul 8.

Authors

Matthew M Walsh¹, John R Anderson²

Affiliations

¹ Air Force Research Laboratory, Wright-Patterson Air Force Base.
² Department of Psychology, Carnegie Mellon University.

PMID: 23834192
PMCID: PMC4309984
DOI: 10.1037/a0033455

Abstract

To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides 2 general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes, cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior but they also provide a useful framework for understanding neural reward valuation and action selection.

PubMed Disclaimer

Figures

**Figure 1**
Actor/critic architecture. The actor records preferences for actions in each state. The critic combines information about immediate reward and the expected value of the subsequent state to compute reward prediction errors (δ). The actor uses reward prediction errors to update action preferences, p(s, a), and the critic uses reward prediction errors to update state values, V(s).

**Figure 2**
Transition structure in sequential choice task (Daw et al., 2011). The first selection lead to one of two intermediate states with fixed probabilities, and the second selection was rewarded probabilistically.

**Figure 3**
Experiment interface (left) and maze structure with correct path in gray (right)(Fu & Anderson, 2006). To exit the maze, participants needed to select the correct cues in Rooms 1, 2, and 3.

**Figure 4**
Maze used to assess detour behavior in rats (Tolman & Honzik, 1930). In different trials, detours were placed at points A and B.

**Figure 5**
Delayed reward task (Tanaka et al., 2009). Some rewards were delivered immediately (trial *t + 1*), and some rewards were delivered after a delay (trial *t + 3*).

**Figure 6**
Harvard Game payoff functions (Tunney & Shanks, 2002). Payoff for meliorating (choose *left*) and maximizing (choose *right*) as a function of the percentage of maximizing responses during the previous ten trials.

See this image and copyright information in PMC

References

1. Abler B, Walter H, Erk S, Kammerer H, Spitzer M. Prediction error as a linear function of reward probability is coded in human nucleus accumbens. Neuroimage. 2006;31:790–795. - PubMed
1. Adams CD, Dickinson A. Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1981;33B:109–121.
1. Anderson JR. How can the human mind occur in the physical universe? New York, NY: Oxford University Press; 2007.
1. Anderson JR, Albert MV, Fincham JM. Tracing problem solving in real time: fMRI analysis of the subject-paced tower of Hanoi. Journal of Cognitive Neuroscience. 2005;17:1261–1274. - PubMed
1. Asaad WF, Rainer G, Miller EK. Task-specific neural activity in the primate prefrontal cortex. Journal of Neurophysiology. 2000;84:451–459. doi: 10.1016/j.neuroimage.2010.12.036. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Navigating complex decision spaces: Problems and paradigms in sequential choice

Affiliations

Navigating complex decision spaces: Problems and paradigms in sequential choice

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources