. 2018 May 14;9(1):1891.

doi: 10.1038/s41467-018-04397-0.

Belief state representation in the dopamine system

Benedicte M Babayan^{1

2}, Naoshige Uchida³, Samuel J Gershman⁴

Affiliations

¹ Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA.
² Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA, 02138, USA.
³ Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA. uchida@mcb.harvard.edu.
⁴ Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA, 02138, USA. gershman@fas.harvard.edu.

PMID: 29760401
PMCID: PMC5951832
DOI: 10.1038/s41467-018-04397-0

Belief state representation in the dopamine system

Benedicte M Babayan et al. Nat Commun. 2018.

. 2018 May 14;9(1):1891.

doi: 10.1038/s41467-018-04397-0.

Authors

Benedicte M Babayan^{1

2}, Naoshige Uchida³, Samuel J Gershman⁴

Affiliations

¹ Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA.
² Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA, 02138, USA.
³ Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA, 02138, USA. uchida@mcb.harvard.edu.
⁴ Department of Psychology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA, 02138, USA. gershman@fas.harvard.edu.

PMID: 29760401
PMCID: PMC5951832
DOI: 10.1038/s41467-018-04397-0

Abstract

Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. It has been proposed that when sensory data provide only ambiguous information about which state an animal is in, it can predict reward based on a set of probabilities assigned to hypothetical states (called the belief state). Here we examine how dopamine RPEs and subsequent learning are regulated under state uncertainty. Mice are first trained in a task with two potential states defined by different reward amounts. During testing, intermediate-sized rewards are given in rare trials. Dopamine activity is a non-monotonic function of reward size, consistent with RL models operating on belief states. Furthermore, the magnitude of dopamine responses quantitatively predicts changes in behavior. These results establish the critical role of state inference in RL.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Task design to test the modulation of dopaminergic RPEs by state inference. a Mice are trained on two perceptually similar states only distinguished by their rewards: small (s₁) or big (s₂). The different trial types, each starting by the onset of a unique odor (conditioned stimulus, CS) predicting the delivery of sucrose (unconditioned stimulus, US), were presented in randomly alternating blocks of five identical trials. A tone indicated block start. Only one odor and one sound cue were used for all blocks, making the two states perceptually similar prior to reward delivery. To test for state inference influence on dopaminergic neuron signaling, we then introduced rare blocks with intermediate-sized rewards. Using (with reinforcement learning (RL) operating on belief states) or not (with standard RL) the training blocks as reference state for computing the value of the novel intermediate states predicts contrasting RPE patterns (b vs d). b RPE across varying rewards computed using standard RL. Because the same odor preceded both reward sizes, a standard RL model with a single state would produce RPEs that increase linearly with reward magnitude. c Belief state b across varying rewards defined as the probability of being in s₁ given the received reward. d RPE across varying rewards computed using the value of the belief state b. A non-monotonic pattern across increasing rewards is predicted when computing the prediction error on the belief state b. e (Top) Population activity of VTA dopaminergic neurons is recorded in behaving mice using fiber photometry. (Bottom) Fiber location above the recorded cells in the VTA, which co-express the calcium reporter GCaMP6f and the fluorescent protein tdTomato (scale bar: 200 μm)

**Fig. 2**
Behavior and dopamine neuron activity on training blocks s₁ and s₂. a Licking across the five trials within a block. Anticipatory licking quantification period during odor to reward delay is indicated by the horizontal black line. b Anticipatory licking at block transition increases when transitioning from the small to the big block. c Anticipatory licking across trials within blocks. Anticipatory licking on trial 1 is similar across all block types then stabilizes at either low or high rates for the following four trials. d Dopamine neuron activity across the five trials within a block. Horizontal black line indicates quantification period for odor (CS) and reward (US) responses. e Dopamine neurons odor response across block transitions is stable. f Dopamine neurons odor response across trials. Dopamine activity adapts to the current block value within one trial. g Dopamine neurons reward response shows an effect of the current reward and previous block on trial 1. h Dopamine neurons reward response across trials. Dopamine activity reaches stable levels as from trial 2. Data represents mean ± s.e.m. *p < 0.05 for t-test comparing average value to 0. n = 11 mice

**Fig. 3**
Dopaminergic and behavioral signature of belief states. a–c Dopamine neurons activity on trial 1. Dopamine neurons show a monotonically increasing response to increasing rewards (a, individual example), quantified as the mean response after reward presentation (0–1 s, indicated by a solid black line in a) in the individual example (b) and across mice (c). d Change in anticipatory licking from trial 1 to trial 2. Mice increase their anticipatory licking after trial 1 proportionally to the increasing rewards. e–g Dopamine neurons activity on trial 2. Dopamine neurons show a non-monotonic response pattern to increasing rewards (e, f, individual example), quantified across all mice (g). h Change in anticipatory licking from trial 2 to trial 3. Whereas mice do not additionally adapt their licking for the known trained volumes (1 and 10 μL) after trial 2, they increase anticipatory licking for small intermediate rewards and decrease it for larger intermediate rewards in a pattern, which follows our prediction of belief state influence on RPE. n = 11, data represent mean ± s.e.m.

**Fig. 4**
RL with belief states explains dopamine reward responses and behavior better than standard RL. Individual DA responses to rewards were fit using either a standard RL model or a RL model computing values on belief states. a Fits to dopamine responses on trial 1. Both RL models fit the dopamine response, since on trial 1 there is no evidence to infer a state on. b Fits to dopamine responses on trial 2. Only computing RPEs using belief states reproduced the non-monotonic change in dopamine response across increasing rewards. c Model predictions on behavior. The value functions from either model fits were positively correlated with the mice’s anticipatory licking, but the RL model with belief state provided a better fit (signed rank test: p = 0.032), suggesting that mice’s anticipatory licking tracks the value of the belief state. d Individual examples of extracted value function from either model and anticipatory licking across increasing rewards on trial 2. n = 11, data represent mean ± s.e.m.

See this image and copyright information in PMC

References

1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. - DOI - PubMed
1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. doi: 10.1016/j.neuron.2005.05.020. - DOI - PMC - PubMed
1. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012;482:85–88. doi: 10.1038/nature10754. - DOI - PMC - PubMed
1. Eshel N, et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature. 2015;525:243–246. doi: 10.1038/nature14855. - DOI - PMC - PubMed
1. Watabe-Uchida M, Eshel N, Uchida N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 2017;40:373–394. doi: 10.1146/annurev-neuro-072116-031109. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01MH109177 /NH/NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations
Molecular Biology Databases
- Mouse Genome Informatics (MGI)
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Belief state representation in the dopamine system

Affiliations

Belief state representation in the dopamine system

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases