Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 14:345:27-37.
doi: 10.1016/j.neuroscience.2016.03.034. Epub 2016 Mar 17.

Orbitofrontal cortex reflects changes in response-outcome contingencies during probabilistic reversal learning

Affiliations

Orbitofrontal cortex reflects changes in response-outcome contingencies during probabilistic reversal learning

L R Amodeo et al. Neuroscience. .

Abstract

In a continuously changing environment, in which behavioral outcomes are rarely certain, animals must be able to learn to integrate feedback from their choices over time and adapt to changing reward contingencies to maintain flexible behavior. The orbitofrontal region of prefrontal cortex (OFC) has been widely implicated as playing a role in the ability to flexibly control behavior. We used a probabilistic reversal learning task to measure rats' behavioral flexibility and its neural basis in the activity of single neurons in OFC. In this task, one lever, designated as 'correct', was rewarded at a high probability (80%) and a second, spatially distinct lever, designated as 'incorrect', was rewarded at a low probability (20%). Once rats reached a learning criterion for reliably selecting the correct lever, reward contingencies of the two levers were switched, and daily sessions were conducted until rats reliably selected the new correct lever. All rats performed the initial Acquisition and subsequent Reversal successfully, with more sessions needed to learn the Reversal. OFC neurons were recorded during five behavioral sessions spanning Acquisition and Reversal learning. The dominant pattern of neural responding in OFC, identified by principal component analysis of the population of neurons recorded, was modulated by reward outcome across behavioral sessions. Generally, activity was higher following rewarded choices than unrewarded. However, there was a correlation between reduced responses to reward following incorrect choices and the establishment of the preference for the correct lever. These results show how signaling by individual OFC neurons may participate in the flexible adaptation of behavior under changing reward contingencies.

Keywords: cognitive flexibility; electrophysiology; orbitofrontal cortex; probabilistic reversal learning; reward.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) Probabilistic Reversal learning task. On each day rats performed 30 trials in which two levers were presented. On one side was the ‘correct’ lever, which yielded a reward of one sugar pellet on 80% of the trials in which it was pressed. The lever on the other side was designated as ‘incorrect’, which was rewarded on 20% of presses. The side contingencies were maintained until rats reached an Acquisition criterion of 80% correct choices for three sessions. Once reached, the ‘correct’ and ‘incorrect’ lever locations were reversed, and learning was measured until the same behavioral criterion was met. (B) Number of sessions to meet learning criterion. In the Acquisition phase, rats reached the learning criterion in 8.63 ± 0.75 sessions. The average length to reach criterion following Reversal was significantly longer at 11.38 ± 1.00 sessions (t(14) = 2.20, p < 0.05). (C) Individual learning curves. Each line represents one subjects’ performance over the days of testing in the Acquisition and Reversal phases separately.
Fig. 2.
Fig. 2.
Pattern of choices across Acquisition and Reversal learning. The proportion of win-stay and lose-shift choices was calculated for correct (left) and incorrect (right) trials separately for each session: Acquisition (Acq/A), Pre-Reversal (Pre/P), Initial Reversal (Rev/R), Mid-Reversal (Mid/M) and Final Reversal (Final/F). The vertical line divides behavior during the initial Acquisition phase from the Reversal phase. Thin gray lines show the behavior of each subject individually, and the average across all subjects is plotted in black. Inset. The number of rewards earned in each session. The horizontal line marks the number of rewards expected when rats choose the correct lever on 80% of trials.
Fig. 3.
Fig. 3.
Location of marking lesions from recording electrodes. Recording sites were located between AP +2.76 and +4.68 mm relative to bregma, and spanned the ventral and lateral divisions of the orbitofrontal cortex. Brain atlas of Paxinos and Watson (2007) was used to identify recording sites.
Fig. 4.
Fig. 4.
Principal component analysis to identify types of OFC neural responses. For each of the 432 neurons recorded during five sessions of behavioral testing, responses were aligned to the time of lever press and the average normalized firing rate was calculated for rewarded and unrewarded trials separately. PCA was then used to identify the patterns of neural responses that accounted for the most variability in the responses on rewarded trials across the population of OFC neurons. Each of the first five principal components (PCs) is plotted in blue in each panel separately. Least squares regression was used to identify which PC each neuron’s pattern of activity most closely matched, and the neuron was then identified as this Type (1–5, number of neurons for each type indicated in panel). For Types 1–5, the average normalized firing rate on rewarded trials is plotted in red, unrewarded in black. Shading around the averages indicates ± 1 SEM. A repeated measures ANOVA showed that responses of Type 1 neurons differentiated rewarded and unrewarded trials from 4 to 10 s after lever press.
Fig. 5.
Fig. 5.
(A) Time-course of Type 1 OFC neural responses. For each behavioral session, the average normalized firing rate of Type 1 neurons was calculated in 0.5-s time bins aligned to the time of lever press for rewarded and unrewarded correct and incorrect choices. Neural responses for rewarded choices are plotted in red, with shading indicating ± 1 SEM, for correct (left) and incorrect (right) lever presses. Responses on unrewarded correct and incorrect trials are plotted in black (shading ± 1 SEM). For the Initial Reversal session, the sides of the correct and incorrect levers were reversed from their locations in the prior Pre-Reversal session. The numbers of Type 1 neurons recorded during each session are indicated. The gray square (4–11 s) marks the epoch during which Type 1 neurons discriminated rewarded from unrewarded trials. Asterisks indicate differences between activity on rewarded and unrewarded trials during individual sessions (post hoc test with Bonferroni correction, p < 0.05). (B) Response of Type 1 neurons during the epoch 4–11 s after lever presses. The average normalized firing rate during the interval from 4 to 11 s after lever press was calculated for each neuron separately for rewarded and unrewarded correct (left) and incorrect (right) choices. Responses to rewarded trials are shown in red and unrewarded in gray. For correct choices, there were no differences between responses to rewarded trials across sessions, or between responses to unrewarded trials across sessions. For incorrect choices, there were no differences between responses to unrewarded trials across sessions, but responses to rewarded trials were reduced on the Pre-Reversal and Final Reversal days compared to the Initial Reversal session (post hoc test with Bonferroni correction, p < 0.05).
Fig. 6.
Fig. 6.
Time-course of OFC responses on rewarded-correct and unrewarded-incorrect trials sorted by behavior on subsequent trial. For each behavioral session, the average normalized firing rate of Type 1 is shown for rewarded-correct (left) and unrewarded-incorrect trials (right), with shading indicating ± 1 SEM. Line pattern indicates what action the rat took on the subsequent trial, stay (solid) or switch (dashed). On the Pre-Reversal day, there were no unrewarded incorrect trials followed by another incorrect lever press. With the exception of Acquisition, there were no differences between ‘stay’ and ‘switch’ trials. On Acquisition, neurons showed significantly higher levels of activity in the epoch 4–11 s (shaded box) after lever press on ‘stay’ trials compared with ‘switch’.

References

    1. Amodeo DA, Jones JH, Sweeney JA, Ragozzino ME (2012) Differences in BTBR T+ tf/J and C57BL/6J mice on probabilistic reversal learning and stereotyped behaviors. Behav Brain Res 227:64–72. - PMC - PubMed
    1. Bissonette GB, Schoenbaum G, Roesch MR, Powell EM (2015) Interneurons are necessary for coordinated activity during reversal learning in orbitofrontal cortex. Biol Psychiatry 77:454–464. - PMC - PubMed
    1. Brown HD, McCutcheon JE, Cone JJ, Ragozzino ME, Roitman MF (2011) Primary food reward and reward-predictive stimuli evoke different patterns of phasic dopamine signaling throughout the striatum. Eur J Neurosci 34:1997–2006. - PMC - PubMed
    1. Castane A, Theobald DE, Robbins TW (2010) Selective lesions of the dorsomedial striatum impair serial spatial reversal learning in rats. Behav Brain Res 210:74–83. - PMC - PubMed
    1. Chudasama Y, Robbins TW (2003) Dissociable contributions of the orbitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination reversal learning: further evidence for the functional heterogeneity of the rodent frontal cortex. J Neurosci 23:8771–8780. - PMC - PubMed

LinkOut - more resources