Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Aug 1;27(31):8366-77.
doi: 10.1523/JNEUROSCI.2369-07.2007.

Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game

Affiliations

Temporal filtering of reward signals in the dorsal anterior cingulate cortex during a mixed-strategy game

Hyojung Seo et al. J Neurosci. .

Abstract

The process of decision making in humans and other animals is adaptive and can be tuned through experience so as to optimize the outcomes of their choices in a dynamic environment. Previous studies have demonstrated that the anterior cingulate cortex plays an important role in updating the animal's behavioral strategies when the action outcome contingencies change. Moreover, neurons in the anterior cingulate cortex often encode the signals related to expected or actual reward. We investigated whether reward-related activity in the anterior cingulate cortex is affected by the animal's previous reward history. This was tested in rhesus monkeys trained to make binary choices in a computer-simulated competitive zero-sum game. The animal's choice behavior was relatively close to the optimal strategy but also revealed small systematic biases that are consistent with the use of a reinforcement learning algorithm. In addition, the activity of neurons in the dorsal anterior cingulate cortex that was related to the reward received by the animal in a given trial often was modulated by the rewards in the previous trials. Some of these neurons encoded the rate of rewards in previous trials, whereas others displayed activity modulations more closely related to the reward prediction errors. In contrast, signals related to the animal's choices were represented only weakly in this cortical area. These results suggest that neurons in the dorsal anterior cingulate cortex might be involved in the subjective evaluation of choice outcomes based on the animal's reward history.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Task and recording sites. A, Oculomotor free choice task. Gray and white rectangles at the bottom correspond to a series of 0.5 s windows used to analyze the neural data that are aligned to the target onset and feedback onset, respectively. FP, Fore period; FB, feedback period. B, An MR image (coronal; anteroposterior, 31 mm; monkey D) showing the recording sites in the ACCd.
Figure 2.
Figure 2.
Learning rate and inverse temperature of reinforcement learning model applied to the choice behavior in the matching pennies task. The results from the three sessions in which the inverse temperature was extremely large (β > 1000) are not shown.
Figure 3.
Figure 3.
Relationship between the difference in spike rates during the feedback period of rewarded and unrewarded trials (abscissa) and its estimate obtained from a regression model that controlled for the variability in saccade latency (ordinate). Black symbols correspond to the neurons with significant effects of reward in a Student's t test (p < 0.05). Gray symbols correspond to the neurons (n = 12) for which the activity was significantly related to the saccade latency (p < 0.05), but not to the reward in the regression model.
Figure 4.
Figure 4.
Activity of an example neuron in the ACCd during the matching pennies task. Each pair of small panels displays the spike density functions estimated relative to the time of target onset (left panels) or feedback onset (right panels). They were estimated separately according to the animal's choice (top), the computer choice (middle), or reward (bottom) in the current trial (Trial Lag = 0) or according to the corresponding variables in three previous trials (Trial Lag = 1, 2, or 3). Purple (black) lines correspond to the activity associated with rightward (leftward) choices (top and middle) or rewarded (unrewarded) trials (bottom). Circles show the regression coefficients from a multiple linear regression model, which was performed separately for each time bin. Filled circles indicate the coefficients significantly different from zero (Student's t test, p < 0.05). The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 5.
Figure 5.
Activity of another example neuron in the ACCd. This is the same format as in Figure 4.
Figure 6.
Figure 6.
Time course of activity related to the animal's choice (top), the choice of the computer opponent (middle), and reward (bottom) in the population of ACCd neurons. Black symbols (left axis) indicate the percentage of neurons that displayed significant modulations in their activity according to each variable (Student's t test; p < 0.05). Gray symbols (right axis) indicate the average magnitude of the regression coefficients related to each variable. These values were estimated separately for different time bins, using a series of multiple linear regression models. Large black symbols indicate that the percentage of neurons was significantly higher than the significance level used in the regression analysis (binomial test, p < 0.05). The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 7.
Figure 7.
Heterogeneity in the time course of reward-related signals in the ACCd. Normalized regression coefficients were averaged for each of four clusters (centroids) identified with the k-means cluster analysis. Shaded region corresponds to the area bounded by the mean ± SEM. The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 8.
Figure 8.
Neural signals related to value functions and reward prediction errors in ACCd. A, Fraction of neurons that significantly modulated their activity according to the sum of the value functions (left), the difference between them (middle), and the reward prediction error during the successive 0.5 s windows used in the regression analysis. Large symbols indicate that the percentage of neurons was significantly higher than the significance level used in the regression analysis (binomial test, p < 0.05). B, The regression coefficients associated with the sum of the value functions (left), the difference between them (middle), and the reward prediction error for the same two example neurons shown in Figures 4 (top) and 5 (bottom). Filled symbols indicate that the regression coefficients were significantly different from zero (Student's t test, p < 0.05). The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 9.
Figure 9.
Correlation between the regression coefficients associated with the variables in the reinforcement learning model and those associated with the animal's rewards in the current and previous trials. Gray symbols correspond to the correlation coefficient for the sum of the value functions (top), the difference between the value functions (middle), and the reward prediction error (RPE; bottom) estimated for the activity during the feedback period. Black symbols show the results for the activity during the second 0.5 s window after feedback onset. Large symbols indicate that the correlation was statistically significant (Student's t test; p < 0.05). The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 10.
Figure 10.
Raster plots and spike density functions for the same neuron shown in Figure 4 sorted by the reward in the current trial (left, unrewarded; right, rewarded) and those in the two previous trials. A three-letter code shown on the left of the raster plots indicates the trials in which the animal was rewarded. For example, RRU indicates that the animal was rewarded in both of the previous two trials, but not in the current trial. Colors of the spike density functions in the bottom panels correspond to those of small bars associated with the raster plots for different reward sequences, except that gray lines in the left (right) column correspond to the average spike density functions for rewarded (unrewarded) trials. The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 11.
Figure 11.
Raster plots and spike density functions for the same neuron shown in Figure 5. This is the same format as in Figure 10.
Figure 12.
Figure 12.
Fraction of neurons showing the significant main effect (left), two-way interactions (middle), and three-way interaction in a three-way ANOVA that included the rewards received by the animal in the current trial, R(t), and the two previous trials, R(t − 1) and R(t − 2). The dotted vertical lines in the left panels correspond to the onset of the fore period, and the gray background corresponds to the delay (left panels) or feedback (right panels) period.
Figure 13.
Figure 13.
Interaction between the reward in the previous trial and that in the current trial. The effect of reward in the previous trial on neural activity was quantified by using a t value separately for rewarded (abscissa) and unrewarded (ordinate) trials. Then the results were plotted separately, depending on whether a given neuron decreased (left, negative reward effect) or increased (right, positive reward effect) its activity during the feedback period of rewarded trials as compared with the activity in unrewarded trials. Filled circles indicate the neurons that displayed significant interactions between the reward in the previous trial and that in the current trial, and the gray background indicates that the magnitude of the t value in unrewarded trials is larger than that in rewarded trials.

Similar articles

Cited by

References

    1. Amiez C, Joseph J-P, Procyk E. Anterior cingulate error-related activity is modulated by predicted reward. Eur J Neurosci. 2005;21:3447–3452. - PMC - PubMed
    1. Amiez C, Joseph J-P, Procyk E. Reward encoding in the monkey anterior cingulate cortex. Cereb Cortex. 2006;16:1040–1055. - PMC - PubMed
    1. Aston-Jones G, Cohen JD. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu Rev Neurosci. 2005;28:403–450. - PubMed
    1. Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci. 2004;7:404–410. - PubMed
    1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed

Publication types

LinkOut - more resources