Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 22;81(2):267-279.
doi: 10.1016/j.neuron.2013.11.005.

Orbitofrontal cortex as a cognitive map of task space

Affiliations

Orbitofrontal cortex as a cognitive map of task space

Robert C Wilson et al. Neuron. .

Abstract

Orbitofrontal cortex (OFC) has long been known to play an important role in decision making. However, the exact nature of that role has remained elusive. Here, we propose a unifying theory of OFC function. We hypothesize that OFC provides an abstraction of currently available information in the form of a labeling of the current task state, which is used for reinforcement learning (RL) elsewhere in the brain. This function is especially critical when task states include unobservable information, for instance, from working memory. We use this framework to explain classic findings in reversal learning, delayed alternation, extinction, and devaluation as well as more recent findings showing the effect of OFC lesions on the firing of dopaminergic neurons in ventral tegmental area (VTA) in rodents performing an RL task. In addition, we generate a number of testable experimental predictions that can distinguish our theory from other accounts of OFC function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Reversal learning. (A) Experimental results showing the mean errors to criterion in initial discrimination learning and final reversal for control (grey) and OFC-lesioned (orange) animals. (B) Model simulations of the same task. (C) State representation of the task used to model control animals, in which the state depends on both the action and outcome on the last trial. (D) Stimulus-bound state representation modeling OFC-lesioned animals.
Figure 2
Figure 2
Delayed alternation. (A) Experimental results showing the fraction of trials on which monkeys chose the correct option for control (grey) and OFC-lesioned (orange) animals. (B) Model simulations on the same task. (C) State representation used to model control animals, in which the state depends on the last action. (D) Stimulus-bound state representation modeling the OFC-lesioned animals.
Figure 3
Figure 3
Extinction. (A) Experimental results. Lever-press rates were normalized to the maximum response rate in conditioning. (B) Model results. (C) State representation used to model the control group in which the state depends on the last outcome. (D) State representation used to model the OFC lesion group, with only a single state. (E) Model predictions for extinction (ext) and spontaneous recovery (re). (F) Model predictions for reacquisition. Init: initial learning; reacq: reacquisition.
Figure 4
Figure 4
Devaluation. (A) Animals are first trained to associate a light with food. Then the food is devalued by pairing it with an indigestion inducing poison, LiCl. In a control condition, the food and LiCl are unpaired during devaluation. Finally, the extent of devaluation is indexed by measuring responding to the light. (B) Experimental results from Pickens et al (2003) showing relative responding to the food cup when the light is turned on for sham and OFC-lesioned animals in the paired and unpaired condition. (C) State representation of the devaluation task. (D) Model results showing the relative value of the light for the sham and OFC-lesioned models.
Figure 5
Figure 5
Task design and state representations for Takahashi et al's (2011) odor guided choice task. (A) Time course of rewards for the different blocks. Times associated with positive prediction errors caused by unexpected rewards are labeled in green. (B) State representation used to model sham-lesioned controls. (C) State representation, used to model OFC-lesioned animals.
Figure 6
Figure 6
Firing of dopaminergic VTA neurons at the time of unexpected reward early (first two trials, red) and late (last five trials, blue) in a block. Unlike in Takahashi et al. (2011), where neural responses were averaged over the different types of unexpected reward delivery, here we divided the data into the four different cases, indicated by the green annotations in figure 5A: the short reward after the long to short transition between blocks 1 and 2 (long → short), the arrival of the first (long → big1) and second (long → big2) drops of reward after the long to big transition between blocks 2 and 3, and the second drop of the small to big transition between blocks 3 and 4 (small → big2). (A) Experimental data for sham-lesioned controls (n = 30 neurons). (B) Experimental data for the OFC-lesioned group (n = 50 neurons). (C) Model predictions for the sham-lesioned animals. (D) Model predictions for OFC-lesioned animals. (E) Model predictions for the small to big transition (small → big2) taking into account the variable third drop of juice.
Figure 7
Figure 7
Schematic of neural RL with hypothesized mapping of functions to brain areas. The environment provides rewards and sensory stimuli to the brain. Rewards, represented in areas such as the lateral habenula (LH) and the pedunculopontine nucleus (PPTN), are used to compute prediction error signals in ventral tegmental area (VTA) and substantia nigra pars compacta (SNc). Sensory stimuli are used to define the animal's state within the current task. The state representation might involve both a stimulus-bound (externally observable) component, which we propose is encoded both in OFC and in sensory areas, and a hidden (unobservable) component which we hypothesize is uniquely encoded in OFC. State representations are then used as scaffolding for both model-free and model-based RL. Model-free learning of state and action values occurs in ventral striatum (VS) and dorsolateral striatum (DLS), respectively, while model-based learning occurs in dorsomedial striatum (DMS) as well as VS.

References

    1. Araujo IE, de Gutierrez R, Oliveira-Maia AJ, Pereira A, Jr, Nicolelis MAL, Simon SA. Neural ensemble coding of satiety states. Neuron. 2006;51(4):483–94. - PubMed
    1. Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. - PubMed
    1. Banich MT, Milham MP, Atchley RA, Cohen NJ, Webb A, Wszalek T, Kramer AF, Liang Z, Barad V, Gullett D, Shah C, Brown C. The prefrontal regions play a predominant role in imposing an attentional ‘set’: evidence from fMRI. Cogn Brain Res. 2000;10:1–9. - PubMed
    1. Bohn I, Giertler C, Hauber W. Orbital prefrontal cortex and guidance of instrumental behaviour in rats under reversal conditions. Behav Brain Res. 2003;143:49–56. - PubMed
    1. Bouton ME. Context and behavioral processes in extinction. Learn Mem. 2004;11:485–94. - PubMed

Publication types

LinkOut - more resources