Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 11;35(6):2407-16.
doi: 10.1523/JNEUROSCI.1989-14.2015.

Reversal learning and dopamine: a bayesian perspective

Affiliations

Reversal learning and dopamine: a bayesian perspective

Vincent D Costa et al. J Neurosci. .

Abstract

Reversal learning has been studied as the process of learning to inhibit previously rewarded actions. Deficits in reversal learning have been seen after manipulations of dopamine and lesions of the orbitofrontal cortex. However, reversal learning is often studied in animals that have limited experience with reversals. As such, the animals are learning that reversals occur during data collection. We have examined a task regime in which monkeys have extensive experience with reversals and stable behavioral performance on a probabilistic two-arm bandit reversal learning task. We developed a Bayesian analysis approach to examine the effects of manipulations of dopamine on reversal performance in this regime. We find that the analysis can clarify the strategy of the animal. Specifically, at reversal, the monkeys switch quickly from choosing one stimulus to choosing the other, as opposed to gradually transitioning, which might be expected if they were using a naive reinforcement learning (RL) update of value. Furthermore, we found that administration of haloperidol affects the way the animals integrate prior knowledge into their choice behavior. Animals had a stronger prior on where reversals would occur on haloperidol than on levodopa (l-DOPA) or placebo. This strong prior was appropriate, because the animals had extensive experience with reversals occurring in the middle of the block. Overall, we find that Bayesian dissection of the behavior clarifies the strategy of the animals and reveals an effect of haloperidol on integration of prior information with evidence in favor of a choice reversal.

Keywords: Bayesian; dopamine; haloperidol; l-DOPA; reinforcement learning; reversal learning.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Trial structure of a single block and the sequence of events in a single trial of the two-arm bandit reversal learning task. Each block contained 80 trials. The stimulus reward mapping was reversed on a randomly chosen trial between trials 30 and 50. Trials before the reversal are referred to as acquisition, and trials after the reversal are referred to as reversal. The reward schedule was always constant within a block (i.e., 80/20, 70/30, or 60/40%), but it usually changed across blocks. ITI, Intertrial interval.
Figure 2.
Figure 2.
Bayesian estimates of reversal points by reinforcement schedule and drug condition. Error bars and shading indicate 1 SEM, and the gray windows indicate the trial range in which a reversal was programmed to occur. A, The mean posterior probability by schedule that the animal reversed its choice behavior on each trial (M = 2; see Materials and Methods). B, Same as A, split out by drug condition. C, Difference in the estimated reversal trial between the behavioral choice (BC; M = 2) and ideal observer (IO; M = 1) models, broken out by schedule and drug condition. D, Same as C except for absolute value of difference. E, Behavioral choice posterior distributions averaged after aligning the posterior from individual blocks to the ideal observer switch trial from each block.
Figure 3.
Figure 3.
Causal evidence at the time the monkeys reversed their behavior. ***p < 0.001. Error bars and shading indicate 1 SEM, and the gray windows indicate the trial range in which a reversal was programmed to occur. A, The posterior of the causal model (M = 3, flat prior) aligned to the estimated trial on which the monkeys switched their choice behavior, averaged by reward schedule (A) or drug condition (B). Note that the posterior in these plots was calculated with a flat prior. C, Log likelihoods for different models with different priors. Sess, Session; Sched, schedule. D, Mean and SD of prior distributions fit to individual sessions for each schedule and drug condition. E, F, Average prior distributions for each reward schedule and drug condition.
Figure 4.
Figure 4.
The fraction of times the initial high probability cue was chosen in the acquisition and reversal phases, broken out by drug and reward schedule. Curves were smoothed with a moving average window of six trials. Because the number of trials before and after acquisition varied across blocks, trial number was normalized to be between 0 and 1 within each phase and then averaged across blocks to generate the plots. A, Choices aligned to reversal points estimated by the ideal observer model (M = 1). B, Choices aligned to reversal points based on reversals in the monkeys' behavior (M = 2).
Figure 5.
Figure 5.
Effects of drug and schedule on inverse temperature estimated with phase divided by ideal observer and behavioral choice models. Error bars indicate 1 SEM. A, Inverse temperature broken out by schedule when acquisition and reversal are defined by ideal observer (left) and behavioral choice (right) models. B, Inverse temperature broken out by drug condition when acquisition and reversal are defined by ideal observer (left) and behavioral choice (right) models.
Figure 6.
Figure 6.
Effects on learning rate parameters for positive and negative feedback estimated with acquisition and reversal phase divided by ideal observer and behavioral choice models. Error bars indicate 1 SEM. A, Learning rates for each form of feedback broken out by reward schedule and learning phase, averaged across drug condition. Pos, Positive; Neg, negative. B, Learning rates by drug condition, averaged across learning phase and reward schedule.

References

    1. Asaad WF, Eskandar EN. A flexible software tool for temporally-precise behavioral control in Matlab. J Neurosci Methods. 2008;174:245–258. doi: 10.1016/j.jneumeth.2008.07.014. - DOI - PMC - PubMed
    1. Berridge KC, Robinson TE, Aldridge JW. Dissecting components of reward: “liking,” “wanting,” and learning. Curr Opin Pharmacol. 2009;9:65–73. doi: 10.1016/j.coph.2008.12.014. - DOI - PMC - PubMed
    1. Chudasama Y, Robbins TW. Dissociable contributions of the orbitofrontal and infralimbic cortex to Pavlovian autoshaping and discrimination reversal learning: further evidence for the functional heterogeneity of the rodent frontal cortex. J Neurosci. 2003;23:8771–8780. - PMC - PubMed
    1. Clarke HF, Hill GJ, Robbins TW, Roberts AC. Dopamine, but not serotonin, regulates reversal learning in the marmoset caudate nucleus. J Neurosci. 2011;31:4290–4297. doi: 10.1523/JNEUROSCI.5066-10.2011. - DOI - PMC - PubMed
    1. Clarke HF, Cardinal RN, Rygula R, Hong YT, Fryer TD, Sawiak SJ, Ferrari V, Cockcroft G, Aigbirhio FI, Robbins TW, Roberts AC. Orbitofrontal dopamine depletion upregulates caudate dopamine and alters behavior via changes in reinforcement sensitivity. J Neurosci. 2014;34:7663–7676. doi: 10.1523/JNEUROSCI.0718-14.2014. - DOI - PMC - PubMed

Publication types