Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 4:4:170.
doi: 10.3389/fnbeh.2010.00170. eCollection 2010.

Tonic dopamine modulates exploitation of reward learning

Affiliations

Tonic dopamine modulates exploitation of reward learning

Jeff A Beeler et al. Front Behav Neurosci. .

Abstract

The impact of dopamine on adaptive behavior in a naturalistic environment is largely unexamined. Experimental work suggests that phasic dopamine is central to reinforcement learning whereas tonic dopamine may modulate performance without altering learning per se; however, this idea has not been developed formally or integrated with computational models of dopamine function. We quantitatively evaluate the role of tonic dopamine in these functions by studying the behavior of hyperdopaminergic DAT knockdown mice in an instrumental task in a semi-naturalistic homecage environment. In this "closed economy" paradigm, subjects earn all of their food by pressing either of two levers, but the relative cost for food on each lever shifts frequently. Compared to wild-type mice, hyperdopaminergic mice allocate more lever presses on high-cost levers, thus working harder to earn a given amount of food and maintain their body weight. However, both groups show a similarly quick reaction to shifts in lever cost, suggesting that the hyperdominergic mice are not slower at detecting changes, as with a learning deficit. We fit the lever choice data using reinforcement learning models to assess the distinction between acquisition and expression the models formalize. In these analyses, hyperdopaminergic mice displayed normal learning from recent reward history but diminished capacity to exploit this learning: a reduced coupling between choice and reward history. These data suggest that dopamine modulates the degree to which prior learning biases action selection and consequently alters the expression of learned, motivated behavior.

Keywords: DAT knock-down; behavioral flexibility; dopamine; environmental adaptation; explore-exploit; reinforcement learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Lever pressing, consumption and body weight across experimental days. Average number of lever presses (LP) per gram of body weight on the (A) expensive lever (genotype, p < 0.01) and (B) inexpensive lever (genotype, NS). (C) Ratio of lever presses on the low cost lever to total lever presses (genotype, p = 0.121). (D) Average number of lever presses per pellet earned (genotype, p = 0.059). (E) Average number of pellets earned per day per gram of body weight (genotype, p = 0.025). (F) Daily body weight across experiment (genotype, NS). Error bars = S.E.M., N = 10.
Figure 2
Figure 2
Mean allocation of effort and runlength on the high and low cost lever following the switch in reward contingency (dashed line). Mean lever presses per minute 10 min before and after reward contingency switch for (A) wild-type and (B) DATkd (genotype × lever × time, p < 0.0001). Mean runlength on each lever for (C) wild-type (D) DATkd (genotype × lever × time, p > 0.001). Mean rate of reinforcement across all contingency switches for (E) wild-type and (F) DATkd on the low → high cost lever (solid line, gold shading) and high → low cost lever (dotted line, gray shading) averaged across all episodes of contingency switches between levers (vertical dashed lines). Shading = S.E.M., N = 10.
Figure 3
Figure 3
Inter-response times and post-reinforcement pauses across experiment. (A) Histogram of inter-response times (IRTs) in 1 s bins normalized to percentage of total IRTs for WT (blue bars) and DATkd (red bars) (genotype × bins, p = 0.0065). (B) Scatterplot of individual subject IRT histograms. (C) Histogram of post-reinforcement pauses (PRPs) in 1 s bins normalized to total PRPs for WT (blue trace) and DATkd (red trace) (genotype × bins, NS). (D) Scatterplot of individual subject PRP histograms. N = 10.
Figure 4
Figure 4
Effort and earned rewards when the price of the high and low cost levers does not switch. Average lever presses on the (A) low cost and (B) high cost lever and average pellets earned on (C) low and (D) high cost levers as the price of the high cost lever increases across days. No significant genotype differences across panels. Error bars = S.E.M, N = 6.
Figure 5
Figure 5
Model of reward function and persistence on high and low cost lever averaged across reward procurement. (A) Reward history as 100 discrete parameters representing 100 actions (rewarded or not) back in time, solid line represents group averages superimposed on a scatterplot of individual subjects (wild-type, blue; DATkd, red). (B) Reward as a continuous function comprised of two exponentials (4 parameters). Though the function incorporates the effect of reward infinitely back in time, only the first 100 actions back are shown. Light traces show curves plotted ± standard error of parameters within groups. (C,D) The two exponentials of the model plotted separately. Solid lines represent model using group means of parameters and light traces represent individual subjects. See Table 2 for statistics. N = 10.

References

    1. Aberman J. E., Salamone J. D. (1999). Nucleus accumbens dopamine depletions make rats more sensitive to high ratio requirements but do not impair primary food reinforcement. Neuroscience 92, 545–552 10.1016/S0306-4522(99)00004-4 - DOI - PubMed
    1. Aston-Jones G., Cohen J. D. (2005a). Adaptive gain and the role of the locus coeruleus-norepinephrine system in optimal performance. J. Comp. Neurol. 493, 99–110 10.1002/cne.20723 - DOI - PubMed
    1. Aston-Jones G., Cohen J. D. (2005b). An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403–450 10.1146/annurev.neuro.28.061604.135709 - DOI - PubMed
    1. Bamford N. S., Zhang H., Schmitz Y., Wu N. P., Cepeda C., Levine M. S., Schmauss C., Zakharenko S. S., Zablow L., Sulzer D. (2004). Heterosynaptic dopamine neurotransmission selects sets of corticostriatal terminals. Neuron 42, 653–663 10.1016/S0896-6273(04)00265-X - DOI - PubMed
    1. Belin D., Everitt B. J. (2008). Cocaine seeking habits depend upon dopamine-dependent serial connectivity linking the ventral with the dorsal striatum. Neuron 57, 432–441 10.1016/j.neuron.2007.12.019 - DOI - PubMed

LinkOut - more resources