. 2017 Jul 19;37(29):6995-7007.

doi: 10.1523/JNEUROSCI.3311-16.2017. Epub 2017 Jun 20.

Prefrontal Neurons Encode a Solution to the Credit-Assignment Problem

Wael F Asaad^{1

2

3

4}, Peter M Lauro^{2

3}, János A Perge², Emad N Eskandar^{5

6}

Affiliations

¹ Department of Neurosurgery and wael_asaad@brown.edu.
² Department of Neuroscience, Brown University, Providence, Rhode Island 02912.
³ Brown University Alpert Medical School, Providence, Rhode Island 02903.
⁴ Norman Prince Neurosciences Institute, Rhode Island Hospital, Providence, Rhode Island 02903.
⁵ Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114, and.
⁶ Department of Neurosurgery, Harvard Medical School, Boston, Massachusetts 02115.

PMID: 28634307
PMCID: PMC5518425
DOI: 10.1523/JNEUROSCI.3311-16.2017

Prefrontal Neurons Encode a Solution to the Credit-Assignment Problem

Wael F Asaad et al. J Neurosci. 2017.

. 2017 Jul 19;37(29):6995-7007.

doi: 10.1523/JNEUROSCI.3311-16.2017. Epub 2017 Jun 20.

Authors

Wael F Asaad^{1

2

3

4}, Peter M Lauro^{2

3}, János A Perge², Emad N Eskandar^{5

6}

Affiliations

¹ Department of Neurosurgery and wael_asaad@brown.edu.
² Department of Neuroscience, Brown University, Providence, Rhode Island 02912.
³ Brown University Alpert Medical School, Providence, Rhode Island 02903.
⁴ Norman Prince Neurosciences Institute, Rhode Island Hospital, Providence, Rhode Island 02903.
⁵ Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114, and.
⁶ Department of Neurosurgery, Harvard Medical School, Boston, Massachusetts 02115.

PMID: 28634307
PMCID: PMC5518425
DOI: 10.1523/JNEUROSCI.3311-16.2017

Abstract

To adapt successfully to our environments, we must use the outcomes of our choices to guide future behavior. Critically, we must be able to correctly assign credit for any particular outcome to the causal features which preceded it. In some cases, the causal features may be immediately evident, whereas in others they may be separated in time or intermingled with irrelevant environmental stimuli, creating a potentially nontrivial credit-assignment problem. We examined the neuronal representation of information relevant for credit assignment in the dorsolateral prefrontal cortex (dlPFC) of two male rhesus macaques performing a task that elicited key aspects of this problem. We found that neurons conveyed the information necessary for credit assignment. Specifically, neuronal activity reflected both the relevant cues and outcomes at the time of feedback and did so in a manner that was stable over time, in contrast to prior reports of representational instability in the dlPFC. Furthermore, these representations were most stable early in learning, when credit assignment was most needed. When the same features were not needed for credit assignment, these neuronal representations were much weaker or absent. These results demonstrate that the activity of dlPFC neurons conforms to the basic requirements of a system that performs credit assignment, and that spiking activity can serve as a stable mechanism that links causes and effects.SIGNIFICANCE STATEMENT Credit assignment is the process by which we infer the causes of our successes and failures. We found that neuronal activity in the dorsolateral prefrontal cortex conveyed the necessary information for performing credit assignment. Importantly, while there are various potential mechanisms to retain a "trace" of the causal events over time, we observed that spiking activity was sufficiently stable to act as the link between causes and effects, in contrast to prior reports that suggested spiking representations were unstable over time. In addition, we observed that this stability varied as a function of learning, such that the neural code was more reliable over time during early learning, when it was most needed.

Keywords: credit assignment; learning; monkey; population coding; prefrontal cortex; single neuron.

PubMed Disclaimer

Figures

**Figure 1.**
Behavioral task, behavioral performance, and recording locations. A, Behavioral task. Animals first acquired, then held fixation for 1 s. This was followed by the appearance of a cue array consisting of four cue objects, randomly arranged, presented for 0.5 s. A 1 s “delay” interval followed. Then, the fixation spot disappeared and animals made their choice by saccading to the former location of one of the cues. Central fixation was maintained throughout the trial until the saccadic response. If the cue designated “correct” had appeared at the chosen location, generic positive feedback (a green circle) was presented for 0.5 s, followed by automated juice reward. If the chosen location had contained an “incorrect” cue, a red “X” was presented without subsequent reward. Once animals learned which cue marked the rewarded location and performed 40–50 further correct trials, a different cue was designated correct and they were required to relearn the correct cue using trial and error. The “spatial” task used the same sequence of cues and motor responses; however, a particular spatial location determined the correct response regardless of the cue that earlier appeared there. In that case, no credit assignment to the cues was necessary for learning. B, Behavioral performance from a typical session for each animal. Blocks, separated by vertical white lines, consisted of one feature (cue picture or spatial location) designated as correct. The white numbers identify the condition in that block (blocks 1–4 for each of the four cue pictures; blocks 5–8 for each of the spatial locations that could be designated correct). Blocks were interleaved in a pseudorandom fashion such that the cue-learning and spatial-learning tasks occurred in pairs and no individual block would be repeated within an eight-block cycle. Behavioral data were smoothed in a 10-trial boxcar average (green, correct choice; red, incorrect choice; pink, broken fixation; blue, early response; gray, no fixation to initiate trial). C, The behavioral strategy used by the animals. Here, rather than overall performance, the probabilities of repeating the immediately preceding choice of cue object (red) or spatial location (blue)—whatever those choices happened to be—are plotted as a function of trial number within a block (±SEs). Only the cue-learning blocks, in which the animals needed to learn which cue object marked the correct spatial location, are included. Note that these probabilities begin relatively higher due to the presence of preceding cue-learning or spatial-learning blocks. Animals relied relatively more on a spatial strategy (reselecting a particular spatial location) early during learning, then switched to a cue-based strategy (reselecting the location indicated by a particular cue) as learning progressed. D, Locations of neuronal recordings. The density of recording sites from both animals is projected onto a reference macaque brain and shown in the three standard planes (see Materials and Methods). Warmer colors indicate relatively more recordings performed at those locations. The visualized slices were selected to pass through the highest-density region.

**Figure 2.**
Cue-related and outcome-related information across neurons. ***A–F***, The population-averaged information about cues (A and B) and outcomes (E and F) for each subject (M1: A, C, and E; M2: B, D, and F). In C and D, and in the top portion of A, B, E, and F, the information (in bits) is shown at each time point, averaged across all recorded neurons from each subject. The shaded region around each line indicated the mean ± SE. Note that information about the correct cue can be present throughout the trial because this is stable over an entire block. The bottom portions of A, B, E, and F show the number of neurons conveying significant information at each time point. C and D show the time course of information about the cues within just those neurons with significant cue-related information during the feedback period. Outcome-related information before the feedback reflects the prior trial's outcome, as previously described (Asaad and Eskandar, 2011). Note that the amount of information conveyed about cues or outcomes over the entire population is within the same order of magnitude (top portions of A, B, E, and F), but this information is distributed over many more neurons in the case of outcome representation (C, D, and bottom portions of A, B, E, and F).

**Figure 3.**
Timing of peak cue-related information. The time of maximal information about the cues is plotted for every neuron (blue) or only neurons carrying statistically significant information (red). For each neuron, the time of peak cue-related information was determined and added to this histogram to depict the number of neurons that conveyed their maximal cue-selective information at each time point. The shaded areas under the red line were integrated to obtain the numbers of neurons whose cue-related information was maximal in the cue or feedback epochs (see text). Note the local peak in the number of neurons whose maximal information about the cues was conveyed during the feedback period.

**Figure 4.**
Examples of individual neurons with both cue and outcome selectivity. ***A–C***, Three individual neurons. The top portion shows each neuron's activity sorted according to the cue picture that was designated correct, while the bottom portion shows those neurons' activities sorted according to the outcome. The top portion of each row shows the activity of these neurons in spikes per second, whereas the bottom portions show the information content in bits (assessed across the 4 cue or outcome exemplars; see Materials and Methods). The shading in the bottom portions reflects the significance of the information metric based upon a bootstrap reshuffling of the assignment of trials to conditions. Note that information about the correct cue picture could be present before the appearance of the cue array because a single cue was designated correct for an entire block. Note also that significant information about outcome can be present before the feedback epoch because the prior trial's outcome was reflected in the outcome categories used here (Asaad and Eskandar, 2011; Donahue and Lee, 2015).

**Figure 5.**
Cue and spatial information as a function of learning. ***A–D***, The information conveyed by neurons during the cue epoch (blue) or feedback epoch (red) about the selected cue (A, B) or spatial location (C, D) is plotted as a function of correct trial number relative to learning criterion (±SEs). The same metric calculated for data in which the assignments of trials to cue objects (A, B) or spatial locations (C, D) were shuffled are shown in yellow and purple, respectively. Data are shown separately for subjects M1 (A, C) and M2 (B, D). Note that the information measured here is an order of magnitude higher than observed in Figure 2, in large part due to differences in entropy in this calculation which considered fewer trials for each data point. Note the somewhat delayed peak in cue information relative to spatial information, which may reflect animals' initial reliance on a spatial strategy before switching to a cue-based strategy.

**Figure 6.**
Cross-temporal decoding of cue-related neuronal activity. A, B, Population activity vectors from simultaneously recorded neurons were used to classify individual trials according to the correct cue (A, M1; B, M2). The accuracy of classification is depicted in the color scale of the central plot. The classifier was trained using a particular time bin (x-axis) and then tested against the same or different time bins (y-axis) from separate trials. Classification used a linear decoder that relied upon simply the minimum Euclidean distance between trained and tested vectors. The decoding accuracy when the same time bin was used for training and testing (across separate trials) resides in the main diagonal and is shown in the upper left (black line with gray areas representing SEs and SDs). A shuffled bootstrap procedure in which trials were randomly reassigned to cues was used to verify chance-level decoding (∼25% correct) in that circumstance (black line with red areas for SEs and SDs). The ROC results comparing actual versus shuffled decoding is shown at the bottom left, and the fraction of recording sessions with significant decoding according to the ROC shuffled bootstrap is shown at the bottom right. The far upper-left shows the ROC results along the main diagonal, with the shading corresponding to the fraction of significant sessions as in the bottom right. Cross-temporal decoding accuracy is depicted at the upper right, which is computed by taking the mean over each diagonal. The SDs and SEs are shown in light and dark gray, respectively (SEs may be imperceptible due to their small values). Note that while there is a peak in decoding accuracy when using nearby time bins (near the center of this plot), decoding accuracy does not return to chance even at large offsets between the training and decoding bins, necessitating some degree of stability in the neuronal representation across time. Exclusion of neurons with potentially unstable baseline activity (see Materials and Methods) did not significantly alter this result.

**Figure 7.**
Decoding accuracy as a function of neuronal ensemble size. Decoding accuracies are plotted against the size of the corresponding neuronal ensembles for each session. The dots and lines represent the means across all training–decoding offsets ± SDs (SEs are too small to be visible). The lines depict the least-squares linear fit to each subject's data (M1: r = 0.518, p = 0.002; M2: r = 0.673, p = 0.0008). Note the y intercept for both animals is appropriately close to chance level (horizontal line, 0.25).

**Figure 8.**
Cross-temporal decoding of cue-related activity in the spatial task. Conventions and methods are the same as in Figure 5. Here, the population decoding was applied to assess the amount of information conveyed by simultaneously recorded neurons about the cue during the spatial task, where the identity of the cue was irrelevant to learning. A, Data for M1. B, Data for M2.

**Figure 9.**
Relationships between behavioral variables and feedback epoch activity. A linear model was fit to the feedback epoch spike rates of individual neurons to assess the influence of the animals' current and upcoming choices on neuronal activity (spike rate in the 500 ms feedback epoch). A, Results for M1. B, Results for M2. Predictor variables consisted of the Task (cue learning vs spatial learning), the outcome (RPE; see Materials and Methods), the identity of the chosen Cue or spatial Location, and whether the animal would repeat that choice of cue or location on the next trial (Will Repeat Cue and Will Repeat Location). The “Total” column on the right shows the number of neurons whose activity was found to be significantly (p < 0.01) dependent upon the factor listed at the left, either singularly or in interaction with another factor; numbers here may not sum to the simple totals taken from the left because neurons were counted only once even if they depended on a particular factor in more than one way (such as in 2 different interactions). Repeating this analysis while excluding neurons with potentially unstable baseline activity (see Materials and Methods) did not significantly alter any of these results (all values within ±4%).

**Figure 10.**
Cross-temporal fidelity of cue representations across the cue and feedback epochs during learning. A, B, The similarity of neuronal representations of the cue across time for subjects M1 (A) and M2 (B) was assessed by taking the cosine between population vectors derived from the cue and feedback epochs of correct trials and plotting this according to trial number relative to learning criterion (first of 4 consecutive correct trials). Shown is the mean vector similarity for each trial (blue) ± SE (left axis). A third-order polynomial fit is overlaid to depict the trend. The shuffled (control) vector similarity values are plotted in red. Concurrent behavioral performance is plotted as a bar graph in the background (right axis). Data are smoothed using a three-trial sliding average.

See this image and copyright information in PMC

References

1. Akaishi R, Kolling N, Brown JW, Rushworth M (2016) Neural mechanisms of credit assignment in a multicue environment. J Neurosci 36:1096–1112. 10.1523/JNEUROSCI.3159-15.2016 - DOI - PMC - PubMed
1. Asaad WF, Eskandar EN (2008) A flexible software tool for temporally precise behavioral control in Matlab. J Neurosci Methods 174:245–258. 10.1016/j.jneumeth.2008.07.014 - DOI - PMC - PubMed
1. Asaad WF, Eskandar EN (2011) Encoding of both positive and negative reward prediction errors by neurons of the primate lateral prefrontal cortex and caudate nucleus. J Neurosci 31:17772–17787. 10.1523/JNEUROSCI.3793-11.2011 - DOI - PMC - PubMed
1. Asaad WF, Rainer G, Miller EK (1998) Neural activity in the primate prefrontal cortex during associative learning. Neuron 21:1399–1407. 10.1016/S0896-6273(00)80658-3 - DOI - PubMed
1. Asaad WF, Santhanam N, McClellan S, Freedman DJ (2013) High-performance execution of psychophysical tasks with complex visual stimuli in MATLAB. J Neurophysiol 109:249–260. 10.1152/jn.00527.2012 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prefrontal Neurons Encode a Solution to the Credit-Assignment Problem

Affiliations

Prefrontal Neurons Encode a Solution to the Credit-Assignment Problem

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources