. 2018 Oct 5;5(4):ENEURO.0331-18.2018.

doi: 10.1523/ENEURO.0331-18.2018. eCollection 2018 Jul-Aug.

Selective Effects of the Loss of NMDA or mGluR5 Receptors in the Reward System on Adaptive Decision-Making

Przemysław Eligiusz Cieślak¹, Woo-Young Ahn², Rafał Bogacz³, Jan Rodriguez Parkitna¹

Affiliations

¹ Department of Molecular Neuropharmacology, Institute of Pharmacology of the Polish Academy of Sciences, 31-343, Krakow, Poland.
² Department of Psychology, Seoul National University, Seoul 08826, Korea.
³ MRC Brain Networks Dynamics Unit, Nuffield Department of Clinical Neurosciences, Oxford University, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom.

PMID: 30302389
PMCID: PMC6175304
DOI: 10.1523/ENEURO.0331-18.2018

Selective Effects of the Loss of NMDA or mGluR5 Receptors in the Reward System on Adaptive Decision-Making

Przemysław Eligiusz Cieślak et al. eNeuro. 2018.

. 2018 Oct 5;5(4):ENEURO.0331-18.2018.

doi: 10.1523/ENEURO.0331-18.2018. eCollection 2018 Jul-Aug.

Authors

Przemysław Eligiusz Cieślak¹, Woo-Young Ahn², Rafał Bogacz³, Jan Rodriguez Parkitna¹

Affiliations

¹ Department of Molecular Neuropharmacology, Institute of Pharmacology of the Polish Academy of Sciences, 31-343, Krakow, Poland.
² Department of Psychology, Seoul National University, Seoul 08826, Korea.
³ MRC Brain Networks Dynamics Unit, Nuffield Department of Clinical Neurosciences, Oxford University, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom.

PMID: 30302389
PMCID: PMC6175304
DOI: 10.1523/ENEURO.0331-18.2018

Abstract

Selecting the most advantageous actions in a changing environment is a central feature of adaptive behavior. The midbrain dopamine (DA) neurons along with the major targets of their projections, including dopaminoceptive neurons in the frontal cortex and basal ganglia, play a key role in this process. Here, we investigate the consequences of a selective genetic disruption of NMDA receptor and metabotropic glutamate receptor 5 (mGluR5) in the DA system on adaptive choice behavior in mice. We tested the effects of the mutation on performance in the probabilistic reinforcement learning and probability-discounting tasks. In case of the probabilistic choice, both the loss of NMDA receptors in dopaminergic neurons or the loss mGluR5 receptors in D₁ receptor-expressing dopaminoceptive neurons reduced the probability of selecting the more rewarded alternative and lowered the likelihood of returning to the previously rewarded alternative (win-stay). When observed behavior was fitted to reinforcement learning models, we found that these two mutations were associated with a reduced effect of the expected outcome on choice (i.e., more random choices). None of the mutations affected probability discounting, which indicates that all animals had a normal ability to assess probability. However, in both behavioral tasks animals with targeted loss of NMDA receptors in dopaminergic neurons or mGluR5 receptors in D₁ neurons were significantly slower to perform choices. In conclusion, these results show that glutamate receptor-dependent signaling in the DA system is essential for the speed and accuracy of choices, but at the same time probably is not critical for correct estimation of probable outcomes.

Keywords: decision-making; dopamine; glutamate receptors; mouse behavior; reinforcement learning.

PubMed Disclaimer

Figures

**Figure 1.**
The probabilistic reinforcement learning task. A, Schematic representation of the probabilistic reinforcement learning task. The animal could make a nose-poke in one of two ports. Following a nose-poke, water could have been delivered with the probability depending on the chosen port. The nose-poke ports were randomly assigned 80% or 20% reward probabilities. During each session, the reward probabilities were reversed after 60 trials. B, An example the choice behavior of a mouse in 600 trials (sessions 6–10). The black line shows the probability of choosing the left side (data smoothed with the 21 point moving average). The cyan bars indicate the side with the higher probability of reward delivery. The red dashed line indicates session boundaries. *C–H*, Probability of selecting the alternative with the higher reward probability by the NR1^DATCreERT2 (mutant, n = 6; control, n = 8; C, F), mGluR5^KD-D1 (mutant, n = 8; control, n = 9; D, G), and NR1^D1CreERT2 (mutant, n = 6; control, n = 9; E, H) strains. *C–E*, Session-by-session analysis; data were collapsed across trials. *F–H*, Trial-by-trial analysis; data were collapsed across sessions. Data are represented as the mean ± SEM.

**Figure 2.**
Computational modeling results. *A–C*, Density plots of posterior group parameter distributions with the best model (model 3) for the NR1^DATCreERT2 (A), mGluR5^KD-D1 (B), and NR1^D1CreERT2 (C) strains. Credible differences are marked with stars, and vertical bars below the plots show 95% HDI ranges.

**Figure 3.**
Effects of previous outcomes on choice. *A–C*, Probabilities of repeating the same choice when the previous response was rewarded (win-stay) or switching to an alternative choice when the preceding response yielded no reward (lose-shift) in the NR1^DATCreERT2 (mutant, n = 6; control, n = 8; A), mGluR5^KD-D1 (mutant, n = 8; control, n = 9; B), and NR1^D1CreERT2 (mutant, n = 6; control, n = 9; C) strains. The probability of win-stay was calculated as the number of times the animal chose the same side as the side chosen during the previously rewarded trial divided by the total number of rewarded trials, while the lose-shift probability was calculated as the number of times the animal changed its choice when the preceding response yielded no reward divided by the total number of unrewarded trials. *D–F*, Simulation performance of the best model (model 3) with respect to mimicking win-stay/lose-shift choice behavior. Data are represented as the mean ± SEM. **p < 0.01 (t test).

**Figure 4.**
Reaction times in the probabilistic reinforcement learning task. *A–I*, Graphs show the reaction times observed in the NR1^DATCreERT2 (mutant, n = 6; control, n = 8; *A–C*), mGluR5^KD-D1 (mutant, n = 8; control, n = 9; *D–F*), and NR1^D1CreERT2 (mutant, n = 6; control, n = 9; *G–I*) strains. A, D, and G show the time elapsed from the trial onset to the choice port entry. B, E, and H show the time from the new trial onset to the choice port entry following previously unrewarded (lose) or rewarded (win) trials. C, F, and I summarize the time from the reward delivery to the reward port entry. Values represent the mean choice latency (all sessions combined) ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 (Bonferroni-corrected t test or t test).

**Figure 5.**
The probability-discounting task. A, Schematic representation of the probability-discounting task. One nose-poke port was associated with the delivery of small certain rewards, while the other nose-poke port was associated with the delivery of large uncertain rewards. Each session consisted of 20 forced trials during which only one port was active, followed by 40 free choice trials during which both ports were active. *B–D*, The graphs show the frequency of choosing the larger reward as a function of its probability in the NR1^DATCreERT2 (mutant, n = 6; control, n = 7; B), mGluR5^KD-D1 (mutant, n = 8; control, n = 9; C), and NR1^D1CreERT2 (mutant, n = 5; control, n = 9; D) strains. Data are represented as the mean ± SEM.

**Figure 6.**
Reaction times in the probability-discounting task. *A–C*, Time elapsed from the trial onset to the choice port entry during the forced choice (left) and free choice (right) trials in the NR1^DATCreERT2 (mutant, n = 6; control, n = 7; A), mGluR5^KD-D1 (mutant, n = 8; control, n = 9; B), and NR1^D1CreERT2 (mutant, n = 5; control, n = 9; C) strains. Bars represent the mean choice latency ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 (Bonferroni-corrected t test).

See this image and copyright information in PMC

References

1. Ahn W-Y, Busemeyer JR, Wagenmakers E-J, Stout JC (2008) Comparison of decision learning models using the generalization criterion method. Cogn Sci 32:1376–1402. 10.1080/03640210802352992 - DOI - PubMed
1. Ahn W-Y, Krawitz A, Kim W, Busmeyer JR, Brown JW (2011) A model-based fMRI analysis with hierarchical bayesian parameter estimation. J Neurosci Psychol Econ 4:95–110. 10.1037/a0020684 - DOI - PMC - PubMed
1. Ahn W-Y, Vasilev G, Lee S-H, Busemeyer JR, Kruschke JK, Bechara A, Vassileva J (2014) Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users. Front Psychol 5:849. 10.3389/fpsyg.2014.00849 - DOI - PMC - PubMed
1. Ahn W-Y, Haines N, Zhang L (2017) Revealing neurocomputational mechanisms of reinforcement learning and decision-making with the hBayesDM package. Comput Psychiatry 1:24–57. 10.1162/CPSY_a_00002 - DOI - PMC - PubMed
1. Balleine BW, Delgado MR, Hikosaka O (2007) The role of the dorsal striatum in reward and decision-making. J Neurosci 27:8161–8165. 10.1523/JNEUROSCI.1554-07.2007 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

MC_UU_12024/5/MRC_/Medical Research Council/United Kingdom

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Mouse Genome Informatics (MGI)
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Selective Effects of the Loss of NMDA or mGluR5 Receptors in the Reward System on Adaptive Decision-Making

Affiliations

Selective Effects of the Loss of NMDA or mGluR5 Receptors in the Reward System on Adaptive Decision-Making

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous