Review

. 2009 Apr;22(3):294-304.

doi: 10.1016/j.neunet.2009.03.010. Epub 2009 Mar 29.

Valuation of uncertain and delayed rewards in primate prefrontal cortex

Soyoun Kim¹, Jaewon Hwang, Hyojung Seo, Daeyeol Lee

Affiliations

PMID: 19375276
PMCID: PMC2693219
DOI: 10.1016/j.neunet.2009.03.010

Review

Valuation of uncertain and delayed rewards in primate prefrontal cortex

Soyoun Kim et al. Neural Netw. 2009 Apr.

. 2009 Apr;22(3):294-304.

doi: 10.1016/j.neunet.2009.03.010. Epub 2009 Mar 29.

Authors

Soyoun Kim¹, Jaewon Hwang, Hyojung Seo, Daeyeol Lee

Affiliation

¹ Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA.

PMID: 19375276
PMCID: PMC2693219
DOI: 10.1016/j.neunet.2009.03.010

Abstract

Humans and animals often must choose between rewards that differ in their qualities, magnitudes, immediacy, and likelihood, and must estimate these multiple reward parameters from their experience. However, the neural basis for such complex decision making is not well understood. To understand the role of the primate prefrontal cortex in determining the subjective value of delayed or uncertain reward, we examined the activity of individual prefrontal neurons during an inter-temporal choice task and a computer-simulated competitive game. Consistent with the findings from previous studies in humans and other animals, the monkey's behaviors during inter-temporal choice were well accounted for by a hyperbolic discount function. In addition, the activity of many neurons in the lateral prefrontal cortex reflected the signals related to the magnitude and delay of the reward expected from a particular action, and often encoded the difference in temporally discounted values that predicted the animal's choice. During a computerized matching pennies game, the animals approximated the optimal strategy, known as Nash equilibrium, using a reinforcement learning algorithm. We also found that many neurons in the lateral prefrontal cortex conveyed the signals related to the animal's previous choices and their outcomes, suggesting that this cortical area might play an important role in forming associations between actions and their outcomes. These results show that the primate lateral prefrontal cortex plays a central role in estimating the values of alternative actions based on multiple sources of information.

PubMed Disclaimer

Figures

**Figure 1**
Spatio-temporal sequence of inter-temporal choice task. In this example, the delays for the rewards resulting from the small-reward (green) and large-reward (red) targets were 2 and 6 s, respectively, and the animal chose the large-reward target.

**Figure 2**
Probability of choosing the small-reward target, P(TS), during the inter-temporal choice task plotted for various combinations of delays for small (TS) and large (TL) reward targets. These results were obtained in separate behavioral experiments conducted prior to the neurophysiological experiments. Dots correspond to the data, whereas lines are the predictions from the best-fitting hyperbolic discount function.

**Figure 3**
Probability of choosing the small-reward target, P(TS), for an example daily session during neurophysiological experiments. Solid and dotted lines correspond to the predictions from the best-fitting hyperbolic and exponential discount functions, respectively. TS (TL) refers to the small-reward (large-reward) target.

**Figure 4**
Activity of two example neurons in the DLPFC (a and b) during the inter-temporal choice (left) and control (right) tasks. Top two rows show the average spike density functions for the trials in which the animal chose the right-ward and left-ward targets, respectively. Activity was averaged separately according to the difference in the temporally discounted values for the two targets, and the magnitudes and delays of the rewards for the two targets are indicated by the legend on the right. For example, S(0):L(0) indicates that the left-ward target delivered a small reward, and that the delays for both rewards was 0s, whereas L(4):S(2) indicate that the left-ward target gave a large reward, and that the delays for the small and large rewards were 2 and 4 s, respectively. Third row shows the average spike rates during the cue period plotted according to the difference in the temporally discounted values (DV) and fictitious discounted values (FDV) for the inter-temporal choice and control task, respectively. Finally, the bottom row shows the average spike rates during the cue period plotted as a function of the DV or FDV for the target chosen by the animal in the inter-temporal choice or control task. Filled and empty symbols indicate the results for the trials in which the animals chose the right-ward and left-ward targets, respectively.

**Figure 5**
Population summary of neural activity in the DLPFC during the inter-temporal choice task. a. Fraction of neurons that significantly modulated their activity during the cue period according to the animal’s choice (Cho), the difference in the temporally discounted values (ΔDV=DV_L−DV_R), and the temporally discounted value of the chosen target (DV_cho). b. Fraction of neurons in the DLPFC that show significant modulations in their activity related to the same variables estimated from a sliding-window regression analysis. c. Coefficient of partial determination (CPD) for the same variables estimated by the sliding-window regression analysis. Shaded areas correspond to the mean±SEM. Gray background in b and c corresponds to the cue period.

**Figure 6**
Sequence of visual stimuli (a) and performance (b) during the matching pennies task. The line plots in (b) show the average regression coefficients determined by a logistic regression model related to the animal’s previous choices (left) or the choices of the computer opponent (right). Small dots indicate the results from individual animals, whereas the large black disks indicate that the overall average values were significantly different from zero (t-test, p<0.05). The gray histograms in (b) show the proportion of the daily sessions in which the corresponding regression coefficient was significantly positive (upper half) or negative (lower half).

**Figure 7**
Activity of an example neuron in the DLPFC showing significant modulation in its activity according to the difference in the value functions during the matching pennies (left). The activity of the same neuron was not significantly related to the sum of the value functions (right). The empty and filled circles indicate the average activity in each decile of trials sorted according to the difference in the value functions (left) or sum of the value functions (right) for the trials in which the animal chose the left-ward and right-ward targets, respectively. Error bars, SEM. Histograms show the distribution of trials along the same variables for the trials in which the animal chose the left-ward (gray) or right-ward (black) trials.

**Figure 8**
Activity of an example DLPFC neuron (same neuron shown in Fig 7) during the matching pennies task. Each pair of small panels displays the spike density functions estimated relative to the time of target onset (left panels) or feedback onset (right panels), separately according to the animal’s choice (top), the choice of the computer opponent (middle), or the animal’s choice outcome (bottom) in the current (Trial Lag=0) or previous (Trial Lag = 1 to 3) trials. Gray (black) lines correspond to the activity associated with right-ward (left-ward) choices in the top two panels or rewarded (unrewarded) trials in the bottom panels. Circles show the regression coefficients from a multiple linear regression model, which was performed separately for a series of 0.5-s windows. Large circles indicate the coefficients significantly different from zero (t-test, p<0.05). Gray background indicates the delay (left panels) or feedback (right panels) periods.

**Figure 9**
Time course of DLPFC activity related to the animal’s choice (top), the choice of the computer opponent (middle), and the animal’s choice outcome (reward, bottom). Each symbol indicates the fraction of the neurons that displayed significant modulations in their activity according to the corresponding variable in the same regression analysis used in Figure 8. Large circles indicate that the percentage of neurons was significantly higher (binomial test, p<0.05) than the significance level used in the regression analysis (0.05). Gray background indicates the delay (left panels) or feedback (right panels) periods.

See this image and copyright information in PMC

Cited by

From habits to self-regulation: how do we change?
Gianessi CA. Gianessi CA. Yale J Biol Med. 2012 Jun;85(2):293-9. Epub 2012 Jun 25. Yale J Biol Med. 2012. PMID: 22737058 Free PMC article.
Heterogeneous coding of temporally discounted values in the dorsal and ventral striatum during intertemporal choice.
Cai X, Kim S, Lee D. Cai X, et al. Neuron. 2011 Jan 13;69(1):170-82. doi: 10.1016/j.neuron.2010.11.041. Neuron. 2011. PMID: 21220107 Free PMC article.
Selective chemogenetic inactivation of corticoaccumbal projections disrupts trait choice impulsivity.
Wenzel JM, Zlebnik NE, Patton MH, Smethells JR, Ayvazian VM, Dantrassy HM, Zhang LY, Mathur BN, Cheer JF. Wenzel JM, et al. Neuropsychopharmacology. 2023 Nov;48(12):1821-1831. doi: 10.1038/s41386-023-01604-5. Epub 2023 May 19. Neuropsychopharmacology. 2023. PMID: 37208501 Free PMC article.
Lateral intraparietal cortex and reinforcement learning during a mixed-strategy game.
Seo H, Barraclough DJ, Lee D. Seo H, et al. J Neurosci. 2009 Jun 3;29(22):7278-89. doi: 10.1523/JNEUROSCI.1479-09.2009. J Neurosci. 2009. PMID: 19494150 Free PMC article.
Early commitment facilitates optimal choice by pigeons.
Zentall TR, Case JP, Berry JR. Zentall TR, et al. Psychon Bull Rev. 2017 Jun;24(3):957-963. doi: 10.3758/s13423-016-1173-8. Psychon Bull Rev. 2017. PMID: 27743217

See all "Cited by" articles

References

1. Ainslie G, Herrnstein RJ. Preference reversal and delayed reinforcement. Animal Learning & Behavior. 1981;9:476–482.
1. Ballard K, Knutson B. Dissociable neural representations of future reward magnitude and delay during temporal discounting. Neuroimage. 2009;45:143–150. - PMC - PubMed
1. Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience. 2004;7:404–410. - PubMed
1. Belova MA, Paton JJ, Salzman CD. Moment-to-moment tracking of state value in the amygdala. Journal of Neuroscience. 2008;28:10023–10030. - PMC - PubMed
1. Berns GS, Laibson D, Loewenstein G. Intertemporal choice - towards an integrative framework. Trends in Cognitive Science. 2007;11:482–488. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Valuation of uncertain and delayed rewards in primate prefrontal cortex

Affiliation

Valuation of uncertain and delayed rewards in primate prefrontal cortex

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources