Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Apr;22(3):294-304.
doi: 10.1016/j.neunet.2009.03.010. Epub 2009 Mar 29.

Valuation of uncertain and delayed rewards in primate prefrontal cortex

Affiliations
Review

Valuation of uncertain and delayed rewards in primate prefrontal cortex

Soyoun Kim et al. Neural Netw. 2009 Apr.

Abstract

Humans and animals often must choose between rewards that differ in their qualities, magnitudes, immediacy, and likelihood, and must estimate these multiple reward parameters from their experience. However, the neural basis for such complex decision making is not well understood. To understand the role of the primate prefrontal cortex in determining the subjective value of delayed or uncertain reward, we examined the activity of individual prefrontal neurons during an inter-temporal choice task and a computer-simulated competitive game. Consistent with the findings from previous studies in humans and other animals, the monkey's behaviors during inter-temporal choice were well accounted for by a hyperbolic discount function. In addition, the activity of many neurons in the lateral prefrontal cortex reflected the signals related to the magnitude and delay of the reward expected from a particular action, and often encoded the difference in temporally discounted values that predicted the animal's choice. During a computerized matching pennies game, the animals approximated the optimal strategy, known as Nash equilibrium, using a reinforcement learning algorithm. We also found that many neurons in the lateral prefrontal cortex conveyed the signals related to the animal's previous choices and their outcomes, suggesting that this cortical area might play an important role in forming associations between actions and their outcomes. These results show that the primate lateral prefrontal cortex plays a central role in estimating the values of alternative actions based on multiple sources of information.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Spatio-temporal sequence of inter-temporal choice task. In this example, the delays for the rewards resulting from the small-reward (green) and large-reward (red) targets were 2 and 6 s, respectively, and the animal chose the large-reward target.
Figure 2
Figure 2
Probability of choosing the small-reward target, P(TS), during the inter-temporal choice task plotted for various combinations of delays for small (TS) and large (TL) reward targets. These results were obtained in separate behavioral experiments conducted prior to the neurophysiological experiments. Dots correspond to the data, whereas lines are the predictions from the best-fitting hyperbolic discount function.
Figure 3
Figure 3
Probability of choosing the small-reward target, P(TS), for an example daily session during neurophysiological experiments. Solid and dotted lines correspond to the predictions from the best-fitting hyperbolic and exponential discount functions, respectively. TS (TL) refers to the small-reward (large-reward) target.
Figure 4
Figure 4
Activity of two example neurons in the DLPFC (a and b) during the inter-temporal choice (left) and control (right) tasks. Top two rows show the average spike density functions for the trials in which the animal chose the right-ward and left-ward targets, respectively. Activity was averaged separately according to the difference in the temporally discounted values for the two targets, and the magnitudes and delays of the rewards for the two targets are indicated by the legend on the right. For example, S(0):L(0) indicates that the left-ward target delivered a small reward, and that the delays for both rewards was 0s, whereas L(4):S(2) indicate that the left-ward target gave a large reward, and that the delays for the small and large rewards were 2 and 4 s, respectively. Third row shows the average spike rates during the cue period plotted according to the difference in the temporally discounted values (DV) and fictitious discounted values (FDV) for the inter-temporal choice and control task, respectively. Finally, the bottom row shows the average spike rates during the cue period plotted as a function of the DV or FDV for the target chosen by the animal in the inter-temporal choice or control task. Filled and empty symbols indicate the results for the trials in which the animals chose the right-ward and left-ward targets, respectively.
Figure 5
Figure 5
Population summary of neural activity in the DLPFC during the inter-temporal choice task. a. Fraction of neurons that significantly modulated their activity during the cue period according to the animal’s choice (Cho), the difference in the temporally discounted values (ΔDV=DVL−DVR), and the temporally discounted value of the chosen target (DVcho). b. Fraction of neurons in the DLPFC that show significant modulations in their activity related to the same variables estimated from a sliding-window regression analysis. c. Coefficient of partial determination (CPD) for the same variables estimated by the sliding-window regression analysis. Shaded areas correspond to the mean±SEM. Gray background in b and c corresponds to the cue period.
Figure 6
Figure 6
Sequence of visual stimuli (a) and performance (b) during the matching pennies task. The line plots in (b) show the average regression coefficients determined by a logistic regression model related to the animal’s previous choices (left) or the choices of the computer opponent (right). Small dots indicate the results from individual animals, whereas the large black disks indicate that the overall average values were significantly different from zero (t-test, p<0.05). The gray histograms in (b) show the proportion of the daily sessions in which the corresponding regression coefficient was significantly positive (upper half) or negative (lower half).
Figure 7
Figure 7
Activity of an example neuron in the DLPFC showing significant modulation in its activity according to the difference in the value functions during the matching pennies (left). The activity of the same neuron was not significantly related to the sum of the value functions (right). The empty and filled circles indicate the average activity in each decile of trials sorted according to the difference in the value functions (left) or sum of the value functions (right) for the trials in which the animal chose the left-ward and right-ward targets, respectively. Error bars, SEM. Histograms show the distribution of trials along the same variables for the trials in which the animal chose the left-ward (gray) or right-ward (black) trials.
Figure 8
Figure 8
Activity of an example DLPFC neuron (same neuron shown in Fig 7) during the matching pennies task. Each pair of small panels displays the spike density functions estimated relative to the time of target onset (left panels) or feedback onset (right panels), separately according to the animal’s choice (top), the choice of the computer opponent (middle), or the animal’s choice outcome (bottom) in the current (Trial Lag=0) or previous (Trial Lag = 1 to 3) trials. Gray (black) lines correspond to the activity associated with right-ward (left-ward) choices in the top two panels or rewarded (unrewarded) trials in the bottom panels. Circles show the regression coefficients from a multiple linear regression model, which was performed separately for a series of 0.5-s windows. Large circles indicate the coefficients significantly different from zero (t-test, p<0.05). Gray background indicates the delay (left panels) or feedback (right panels) periods.
Figure 9
Figure 9
Time course of DLPFC activity related to the animal’s choice (top), the choice of the computer opponent (middle), and the animal’s choice outcome (reward, bottom). Each symbol indicates the fraction of the neurons that displayed significant modulations in their activity according to the corresponding variable in the same regression analysis used in Figure 8. Large circles indicate that the percentage of neurons was significantly higher (binomial test, p<0.05) than the significance level used in the regression analysis (0.05). Gray background indicates the delay (left panels) or feedback (right panels) periods.

Similar articles

Cited by

References

    1. Ainslie G, Herrnstein RJ. Preference reversal and delayed reinforcement. Animal Learning & Behavior. 1981;9:476–482.
    1. Ballard K, Knutson B. Dissociable neural representations of future reward magnitude and delay during temporal discounting. Neuroimage. 2009;45:143–150. - PMC - PubMed
    1. Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nature Neuroscience. 2004;7:404–410. - PubMed
    1. Belova MA, Paton JJ, Salzman CD. Moment-to-moment tracking of state value in the amygdala. Journal of Neuroscience. 2008;28:10023–10030. - PMC - PubMed
    1. Berns GS, Laibson D, Loewenstein G. Intertemporal choice - towards an integrative framework. Trends in Cognitive Science. 2007;11:482–488. - PubMed

Publication types

LinkOut - more resources