. 2013;9(12):e1003387.

doi: 10.1371/journal.pcbi.1003387. Epub 2013 Dec 5.

Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans

Aaron M Bornstein¹, Nathaniel D Daw

Affiliations

PMID: 24339770
PMCID: PMC3854511
DOI: 10.1371/journal.pcbi.1003387

Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans

Aaron M Bornstein et al. PLoS Comput Biol. 2013.

. 2013;9(12):e1003387.

doi: 10.1371/journal.pcbi.1003387. Epub 2013 Dec 5.

Authors

Aaron M Bornstein¹, Nathaniel D Daw

Affiliation

¹ Department of Psychology, Program in Cognition and Perception, New York University, New York, New York, United States of America.

PMID: 24339770
PMCID: PMC3854511
DOI: 10.1371/journal.pcbi.1003387

Abstract

How do we use our memories of the past to guide decisions we've never had to make before? Although extensive work describes how the brain learns to repeat rewarded actions, decisions can also be influenced by associations between stimuli or events not directly involving reward - such as when planning routes using a cognitive map or chess moves using predicted countermoves - and these sorts of associations are critical when deciding among novel options. This process is known as model-based decision making. While the learning of environmental relations that might support model-based decisions is well studied, and separately this sort of information has been inferred to impact decisions, there is little evidence concerning the full cycle by which such associations are acquired and drive choices. Of particular interest is whether decisions are directly supported by the same mnemonic systems characterized for relational learning more generally, or instead rely on other, specialized representations. Here, building on our previous work, which isolated dual representations underlying sequential predictive learning, we directly demonstrate that one such representation, encoded by the hippocampal memory system and adjacent cortical structures, supports goal-directed decisions. Using interleaved learning and decision tasks, we monitor predictive learning directly and also trace its influence on decisions for reward. We quantitatively compare the learning processes underlying multiple behavioral and fMRI observables using computational model fits. Across both tasks, a quantitatively consistent learning process explains reaction times, choices, and both expectation- and surprise-related neural activity. The same hippocampal and ventral stream regions engaged in anticipating stimuli during learning are also engaged in proportion to the difficulty of decisions. These results support a role for predictive associations learned by the hippocampal memory system to be recalled during choice formation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Serial reaction time task.**
Images were presented one at a time for a fixed 3000-order Markov transition process (i.e., a matrix of conditional probabilties). The conditional probabilities were changed abruptly at three points during the task, unaligned to rest periods and with no visual or other notification. (Images shown here are not those used in the study, but public domain stand-ins from clker.com that reflect the category of the photographs used during the experiment.)

formula image — **Figure 1. Serial reaction time task.**
Images were presented one at a time for a fixed 3000-order Markov transition process (i.e., a matrix of conditional probabilties). The conditional probabilities were changed abruptly at three points during the task, unaligned to rest periods and with no visual or other notification. (Images shown here are not those used in the study, but public domain stand-ins from clker.com that reflect the category of the photographs used during the experiment.)

**Figure 2. Choice task.**
Participants were asked to use their knowledge of the sequential transition structure to make decisions for reward. Choice rounds consisted of three steps. First, participants observed the reward amount and target image for one second. Next, they were given five seconds to choose one of two images to start the sequence from again. This choice was of varying difficulty, depending on how likely it was for each choice image to be followed by the reward image. For the next several presentations after choice, each observation of the valued image was accompanied by reward. (Images shown here are not those used in the study, but public domain stand-ins from clker.com that reflect the category of the photographs used during the experiment.)

**Figure 3. Behavioral analyses.**
a. Reaction time on the image identification task decreases as the ‘ground-truth’ probability – the probabilities generated by the task program, and uninstructed to the participant – of that image appearing, conditional on the previous image increases. Here, for each participant, RTs were first corrected for their mean and a number of nuisance effects, estimated using a linear regression containing only these effects as explanatory variables. b. Across subjects, the fitted learning rate values that best explain behavior. For reaction times, the best-fitting model contained two learning rates (one ‘slow’, the other ‘fast’), whose estimates were combined linearly according to a fitted weighting parameter. For choice behavior, the best-fitting model contained one learning rate, statistically indistinguishable from the slow rate fit to reaction times, but significantly different from the fast.

**Figure 4. BOLD signal reflecting anticipation of the next stimulus.**
a. BOLD signal correlated with forward entropy in the fast process. Activity in the dorsal caudate was significant after correction over an anatomically-defined mask of bilateral caudate. b. BOLD signal correlated with forward entropy in the slow process. Activity in the anterior hippocampus was significant after correction over an anatomically-defined mask of left hippocampus. Both a and b displayed at , uncorrected.

**Figure 5. Learning rate computed from BOLD signal.**
Learning rates computed from each of our regions of interest, overlaid on the learning rates fit to reaction time behavior. The best-fitting learning rates are displayed for each type of trial: sequential image-identification trials, decision trials, and choice outcome trials. For learning trials in hippocampus and caudate, learning rates are computed using the forward entropy regressor. For learning trials in face- and house-selective cortex, learning rates are computed using the estimated probability of the image appearing on the next trial. For decision trials in hippocampus, learning rate is computed using the choice difficulty regressor. For decision trials in face- and house-selective cortex, learning rates are computed using the portion of the choice difficulty regressor specific to that image. For outcome trials in nucleus accumbens, learning rate is computed using the reward prediction error regressor. Error bars: 1 SEM.

**Figure 6. BOLD signal during choices and outcomes.**
During deliberation periods after choice options were presented, we observed activity in a. posterior cingulate (−2, −18, 32), anterior mPFC (4, 64, −2) and b. left hippocampus (peak −24, −10, −18), all significantly correlated with choice difficulty in the slow process. c. BOLD signal at outcome. A cluster in the nucleus accumbens (peak 10, 12, −2) correlated with reward prediction error as computed using the expectations derived from the slow process. All activations displayed at , uncorrected.

**Figure 7. Image-selective regions.**
The regions defined by the in-task localizer contrasts house face and face house, are colored yellow (left: face, right: house). The face localizer yielded the largest cluster of activation in a region of right fusiform gyrus. The house localizer yielded the largest cluster of activation in a region stretching from posterior parahippocampal gyrus to the occipital lobe. Regions selectively sensitive to the estimated probability of an image appearing next (on sequential response trials) are colored blue. Regions selectively sensitive to the difficulty of deciding whether a particular image would lead to reward are colored red. Displayed at , uncorrected.

See this image and copyright information in PMC

References

1. Dickinson A, Balleine BW (2002) The role of learning in the operation of motivational systems. In: Gallistel CR, Pashler HV, editors. Stevens Handbook of Experimental Psychology. Vol. 3: Learning, Motivation and Emotion. New York, NY: John Wiley & Sons Inc. pp. 497–533.
1. Dickinson A (1980) Contemporary Animal Learning Theory. Cambridge: Cambridge University Press.
1. Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience 8: 1704–1711. - PubMed
1. Thorndike EL (1911) Animal Intelligence. New York: Macmillan.
1. Barto AC (1995) Adaptive Critics and the Basal Ganglia. In: Houk JC, Davis JL, Beiser DG, editors. Models of information processing in the basal ganglia, Cambridge, MA: MIT Press. pp. 215–232.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans

Affiliation

Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous