A pallidus-habenula-dopamine pathway signals inferred stimulus values

Ethan S Bromberg-Martin¹, Masayuki Matsumoto, Simon Hong, Okihide Hikosaka

Affiliations

Affiliation

¹ Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bldg. 49, Rm. 2A50, Bethesda, Maryland 20892-4435, USA. bromberge@mail.nih.gov

PMID: 20538770
PMCID: PMC2934919
DOI: 10.1152/jn.00158.2010

A pallidus-habenula-dopamine pathway signals inferred stimulus values

Ethan S Bromberg-Martin et al. J Neurophysiol. 2010 Aug.

. 2010 Aug;104(2):1068-76.

doi: 10.1152/jn.00158.2010. Epub 2010 Jun 10.

Authors

Ethan S Bromberg-Martin¹, Masayuki Matsumoto, Simon Hong, Okihide Hikosaka

Affiliation

¹ Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bldg. 49, Rm. 2A50, Bethesda, Maryland 20892-4435, USA. bromberge@mail.nih.gov

PMID: 20538770
PMCID: PMC2934919
DOI: 10.1152/jn.00158.2010

Abstract

The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.

PubMed Disclaimer

Figures

**Fig. 1.**
Reward-biased saccade task. A: task diagram. The monkey fixated a central spot for 1.2 s. The spot disappeared and simultaneously a visual target appeared on the left or right side of the screen. The monkey was required to saccade to the target. In 1 block of 24 trials, left saccades were rewarded and right saccades were unrewarded (block 1); in the next block, the reward values were reversed without notice to the animal (block 2). B: example sequence of events after a block change. In the 1st trial of the new block, the monkey receives an unexpected reward outcome (trial 1: right target, reward). The 2nd trial of the block could present the same target, whose new reward value had just been experienced (trial 2: same target, experienced value), or it could present the other target, which had been absent on the previous trial and whose new reward value had to be inferred based on the reversal rule of the task (trial 2: other target, inferred value). C: 2 ways to learn stimulus values from the pairing right target → reward. *Left*: if the animal learned through experience alone, the right target value would be increased but the left target value would remain unchanged. In trial 2, the animal would show no preference between the targets. *Right*: if the animal learned through inference, the animal would additionally infer that the block had changed to block 2, and hence the left target value had decreased. The animal's preference would switch from the left target to the right target.

**Fig. 2.**
Combination of experienced and inferred stimulus values in neural activity and behavior in *monkeys E* and L. The rows represent (A) lateral habenula neurons, (B) dopamine neurons, and (C) behavioral reaction times. *First 3 columns*: data for the 1st trial of the block (trial 1), for the 2nd trial of the block when the target was different from the 1st trial (trial 2, Other Target), and for the 2nd trial of the block when the target was the same as on the 1st trial (trial 2, Same Target). Data are shown separately for the target that was rewarded in the previous block and unrewarded in the current block (old R, new U, blue) and for the target that was unrewarded in the previous block and rewarded in the current block (old U, new R, red). Neural firing rates were smoothed with a Gaussian kernel (σ = 15 ms) and averaged over neurons. Shaded areas and error bars are ±SE. Gray bars along the time axis indicate the response window for calculation of reversal indexes. Note that each red or blue curve in A and B only includes data from neurons that had at least 1 trial in which the appropriate current-trial and past-trial targets were presented (n = 42–63 for each curve). *Right 3 column*: reversal index on the 2nd trial of the block, calculated using all data (1st column), using data from *monkey L* (2nd column), and using data from *monkey E* (3rd column). Reversal indexes were calculated separately for other-target trials when the value of the target had to be inferred (white bars, Inf) and for same-target trials when the value of the target had already been experienced on the 1st trial of the block (gray bars, Exp). Numbers at the bottom of each bar indicate the number of neural recording sessions for that bar. Symbols indicate statistical significance measured using a shuffling procedure (*P < 0.05; ⁺P ≤ 0.06; ns P > 0.06). Error bars are ±SE. Neural and behavioral measures of stimulus values reversed on both trial types but reversed less fully on inferred value trials.

**Fig. 3.**
Neural responses to outcome delivery in *monkeys E* and L. The rows represent (A) lateral habenula neurons and (B) dopamine neurons. Same format as the *left 3 columns* of Fig. 2. Data are plotted from the same neurons and trials as in Fig. 2, A and B, but aligned on outcome delivery. Gray bars along the time axis indicate the time window for measuring the outcome response. On the 1st trial of each block when an unexpected outcome was delivered, lateral habenula and dopamine neurons had a strong outcome response (*left column*). On inferred-value trials, lateral habenula neurons had a tendency for a small residual outcome response (*middle column*).

**Fig. 4.**
Experienced and inferred stimulus values in neural activity and behavior in *monkeys N* and D. Same format as Fig. 2. The rows represent (A) GPi^LHb-negative neurons, (B) lateral habenula multiunit activity, and (C) behavioral reaction times. Note that each red or blue curve in A and B only includes data from neurons that had at least 1 trial in which the appropriate current-trial and past-trial targets were presented (n = 24–37 for each curve). In *monkey D*, neural and behavioral measures of stimulus values reversed similarly on both experienced value and inferred value trials (*right column*).

See this image and copyright information in PMC

References

1. Bayley PJ, Frascino JC, Squire LR. Robust habit learning in the absence of awareness and independent of the medial temporal lobe. Nature 436: 550–553, 2005 - PMC - PubMed
1. Camerer C, Ho T-H. Experience-weighted attraction learning in normal form games Econometrica 67: 827–874, 1999
1. Christoph GR, Leonzio RJ, Wilcox KS. Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat. J Neurosci 6: 613–619, 1986 - PMC - PubMed
1. Daw ND, Courville AC, Touretzky DS. Representation and timing in theories of the dopamine system. Neural Comput 18: 1637–1677, 2006 - PubMed
1. Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8: 1704–1711, 2005 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A pallidus-habenula-dopamine pathway signals inferred stimulus values

Affiliation

A pallidus-habenula-dopamine pathway signals inferred stimulus values

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources