Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli

Kensaku Nomoto¹, Wolfram Schultz, Takeo Watanabe, Masamichi Sakagami

Affiliations

PMID: 20702700
PMCID: PMC3297489
DOI: 10.1523/JNEUROSCI.4828-09.2010

Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli

Kensaku Nomoto et al. J Neurosci. 2010.

. 2010 Aug 11;30(32):10692-702.

doi: 10.1523/JNEUROSCI.4828-09.2010.

Authors

Kensaku Nomoto¹, Wolfram Schultz, Takeo Watanabe, Masamichi Sakagami

Affiliation

¹ Brain Science Institute, Tamagawa University, Machida, Tokyo, 194-8610, Japan.

PMID: 20702700
PMCID: PMC3297489
DOI: 10.1523/JNEUROSCI.4828-09.2010

Abstract

Midbrain dopamine neurons respond to reward-predictive stimuli. In the natural environment reward-predictive stimuli are often perceptually complicated. Thus, to discriminate one stimulus from another, elaborate sensory processing is necessary. Given that previous studies have used simpler types of reward-predictive stimuli, it has yet to be clear whether and, if so, how dopamine neurons obtain reward information from perceptually complicated stimuli. To investigate this, we recorded the activities of monkey dopamine neurons while they were performing discrimination between two coherent motion directions in random-dot motion stimuli. These coherent directions were paired with different magnitudes of reward. We found that dopamine neurons showed reward-predictive responses to random-dot motion stimuli. Moreover, dopamine neurons showed temporally extended activity correlated with changes in reward prediction (i.e., reward prediction error) from coarse to fine scales between initial motion detection and subsequent motion discrimination phases. Noticeably, dopamine reward-predictive responses became differential in a later phase than previously reported. This response pattern was consistent with the time course of processing required for the estimation of expected reward value that parallels the motion direction discrimination processing. The results demonstrate that dopamine neurons are able to reflect the reward value of perceptually complicated stimuli, and suggest that dopamine neurons use the moment-to-moment reward prediction associated with environmental stimuli to compute a reward prediction error.

PubMed Disclaimer

Figures

**Figure 1.**
Behavioral task and behavioral performance. A, Time course of the motion direction discrimination task. B, Schema of asymmetric reward schedule. Reward-direction contingencies were reversed from block to block. The size of a white teardrop mark indicates reward magnitude associated with the particular motion direction (large mark, 0.38 ml; small mark, 0.16 ml). The amount of reward upon correct response was determined by the direction of the motion stimulus. Coherence levels for monkey K are shown at the upper row for display purposes. C, Effects of motion coherence and the reward schedule on monkeys' choices. Monkeys tended to respond in the direction associated with a large reward. Rightward large-reward blocks (black) and rightward small-reward blocks (gray) are shown separately. Circles and error bars represent the means and SDs of the choice rate across all sessions, respectively. Choice data were fitted to a logistic function. The vertical axis indicates the proportion of rightward saccades. The horizontal axis indicates the motion coherence (a positive value indicates a rightward motion; a negative value indicates a leftward motion). D, Saccadic reaction time. The data were sorted by motion coherence levels and directions, which are associated with large reward (black bars) and small reward (gray bars). Only correct trials were analyzed. Error bars represent SDs across all sessions.

**Figure 2.**
Recording sites. A, A representative digitized image of the recording sites with Nissl staining. Tilted lines indicate electrode tracks. Ticks represent microlesions. Filled circles indicate the recording sites of putative dopamine neurons. Scale bar, 500 μm. Inset, Magnified image around the microlesion (a rectangular area in the Nissl staining image) from an adjacent slice with anti-TH immunohistochemistry. Black neurons correspond to TH-immunoreactive cells. B, Reconstruction of the recording sites in the right hemisphere of monkey K. Numbers indicate the anteroposterior level. Tilted lines indicate electrode tracks. Filled circles indicate the recording sites of putative dopamine neurons. We delineated the areas rich with dopamine neurons according to the anti-TH staining slices. SNr, Substantia nigra pars reticulata; NIII, oculomotor nerve outlets; RNm, magnocellular red nucleus.

**Figure 3.**
Dopamine responses at the single neuron and population levels aligned with the time of RDM or saccade onset. A, Typical RDM-evoked responses of a single dopamine neuron (monkey L). Spike raster plots and histograms are aligned with the time of RDM onset (binwidth = 20 ms). Spike raster plots are sorted by saccadic reaction times for display purposes. Gray cross marks in spike raster plots indicate the time of saccade onset. The vertical axis shows the firing rate of dopamine activity. Each column corresponds to the trial conditions in which a certain amount of reward, either large (0.38 ml) or small (0.16 ml), was delivered upon a correct response. Each row shows each coherence level. Only correct trials were shown except at the zero coherence level. At the zero coherence level, we showed all trials and sorted neural activity based on the monkeys' choices. B, Population RDM-evoked responses of monkeys L and K (N = 41 and 35, respectively). The data are aligned with the time of RDM onset. The vertical axis shows the firing rate of dopamine activity. Line colors indicate reward conditions (black, large-reward condition; gray, small-reward condition). Dashed vertical and horizontal lines indicate the means and SDs of the time of saccadic onset, respectively. Each row shows each coherence level. C, Population saccade-aligned responses of monkeys L and K (N = 41 and 35, respectively). The data are aligned with the time of saccade onset. The vertical axis shows the firing rate of dopamine activity. Line colors indicate reward conditions (black, large-reward condition; gray, small-reward condition). Dashed vertical and horizontal lines indicate the means and SDs of the time of RDM onset, respectively. Each row shows coherence level.

**Figure 4.**
Prediction error coding in reward value. Comparison between dopamine responses and reward prediction errors. Reward prediction errors are calculated as an average amount of juice reward (see Results). The line is fit to the data points corresponding to late-period RDM-evoked responses in the large-reward condition (black circles) and the summation of FP-evoked and early-period RDM-evoked responses (black square) by weighted type II regression (Press et al., 1992). Late-period RDM-evoked responses in the small-reward condition and zero coherence condition are represented by white circles and diamond, respectively. FP-evoked (gray square in the left) and early-period RDM-evoked responses (gray square in the right) are shown for comparison with the summation of both responses (black square). These points are horizontally jittered only for display purposes. Horizontal and vertical error bars represent SDs. The vertical axis shows dopamine responses. The horizontal axis shows reward prediction errors as juice quantity.

**Figure 5.**
Population dopamine responses aligned with the time of feedback tone onset. A, Population spike density functions are aligned with the time of feedback tone onset. As poor discrimination for less coherent motion resulted in low reward probability, the reward prediction error would be large at low coherence levels. Indeed, dopamine responses to feedback tones were the largest at zero coherence and decreased with increasing motion coherence. In addition, there was a significant suppression in dopamine activity in response to the error feedback tones, indicating negative reward prediction error coding. The vertical axis shows the firing rate of dopamine activity. Black and gray solid lines indicate dopamine activity in large- and small-reward correct trials, respectively. Red solid lines correspond to dopamine activity in small-reward error trials, resulting in no reward delivery. Because there were not enough trials for this condition, dopamine activity of small-reward correct trials was missing at the zero coherence level. Each row shows each coherence level. Note that the actual reward was delivered 200 ms after the feedback tone signaling a correct response. B, Comparison between feedback-tone-evoked responses and reward prediction errors. Reward prediction errors are calculated as an average amount of juice reward (see Results). The line is fit to the data points corresponding to correct trials (black and gray symbols) by weighted type II regression (Press et al., 1992). Dopamine responses in error trials are represented by red symbols. The large-reward, small-reward, and zero coherence condition are represented by circles, squares, and diamond, respectively. Horizontal and vertical error bars represent SDs. The vertical axis shows dopamine responses. The horizontal axis shows reward prediction errors as juice quantity.

**Figure 6.**
Temporal prediction error coding. A, Population spike density functions sorted by the duration of the pre-RDM interval (N = 41 and 35 for monkeys L and K, respectively). The data are aligned with RDM onset. The early-period activity decreases with the duration of the pre-RDM interval. The vertical axis shows the firing rate of dopamine activity. Line colors indicate the duration of the pre-RDM interval (see inset). B, Coding of prediction errors against increasing expectation of the appearance of the RDM stimulus. The early-period activity was negatively correlated with the duration of the pre-RDM interval (white squares; Y = 16.1 − 4.59X for monkey L, Y = 13.3 − 2.83X for monkey K; where Y denotes the firing rate, and X denotes the duration of the pre-RDM interval), whereas the late-period activity showed far smaller modulation (black circles; Y = 4.88 + 0.35X for monkey L, Y = 6.52 − 0.08X for monkey K). Symbols represent the grand averages of dopamine activity of the trials in which the pre-RDM interval was within a 200 ms window centered at the time indicated by the horizontal axis. Note that all motion directions and coherence levels were collapsed. Error bars indicate SDs. C, Schematic illustration of decomposition of the early-period activity. The early-period activity can be decomposed into baseline activity and RDM-evoked response. Baseline activity was measured during the 200 ms before RDM onset. RDM-evoked response was calculated by subtracting baseline activity from the early-period activity. D, Reductions in baseline activity and in RDM-evoked response contributed to time-dependent decrease in the early-period activity. We decomposed the early-period activity into two factors: baseline activity and RDM-evoked response. Both baseline activity (white squares; Y = 6.16 − 1.46X for monkey L, Y = 6.16 − 0.74X for monkey K) and RDM-evoked response (black circles; Y = 9.91 − 3.13X for monkey L, Y = 6.64 − 2.09X for monkey K) were negatively correlated with the duration of the pre-RDM interval. Symbols represent the grand averages of dopamine activity of the trials in which the pre-RDM interval was within a 200 ms window centered at the time indicated by the horizontal axis. Note that all motion directions and coherence levels were collapsed. Error bars indicate SDs.

**Figure 7.**
Neuronal profile and recording depth. Relation between recording depth and neuronal profile: neuronal sensitivity for temporal predictability (A) and for reward value (B, C). The vertical axis shows neuronal sensitivity, which was indexed by the slope of the corresponding regression line (see Results). The horizontal axis shows recording depth, which was measured from a reference depth (i.e., the tip of the guide tube). This depth corresponded to the depth just beneath the dura. Each point represents an individual neuron. Black symbols represent the neurons in which the slopes were significantly different from zero.

See this image and copyright information in PMC

References

1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed
1. Belova MA, Paton JJ, Salzman CD. Moment-to-moment tracking of state value in the amygdala. J Neurosci. 2008;28:10023–10030. - PMC - PubMed
1. Britten KH, Shadlen MN, Newsome WT, Movshon JA. The analysis of visual motion: a comparison of neuronal and psychophysical performance. J Neurosci. 1992;12:4745–4765. - PMC - PubMed
1. Celebrini S, Newsome WT. Neuronal and psychophysical sensitivity to motion signals in extrastriate area MST of the macaque monkey. J Neurosci. 1994;14:4109–4124. - PMC - PubMed
1. Ditterich J, Mazurek ME, Shadlen MN. Microstimulation of visual cortex affects the speed of perceptual decisions. Nat Neurosci. 2003;6:891–898. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli

Affiliation

Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources