Vicarious reinforcement learning signals when instructing others

Matthew A J Apps¹, Elise Lesage², Narender Ramnani³

Affiliations

¹ Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford OX1 9DU, United Kingdom, Department of Experimental Psychology, University of Oxford, Oxford OX1 2JD, United Kingdom, Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and matthew.apps@ndcn.ox.ac.uk.
² Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and Neuroimaging Research Branch, Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, Maryland 21224.
³ Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and.

PMID: 25698730
PMCID: PMC4331622
DOI: 10.1523/JNEUROSCI.3669-14.2015

Vicarious reinforcement learning signals when instructing others

Matthew A J Apps et al. J Neurosci. 2015.

. 2015 Feb 18;35(7):2904-13.

doi: 10.1523/JNEUROSCI.3669-14.2015.

Authors

Matthew A J Apps¹, Elise Lesage², Narender Ramnani³

Affiliations

¹ Nuffield Department of Clinical Neuroscience, University of Oxford, Oxford OX1 9DU, United Kingdom, Department of Experimental Psychology, University of Oxford, Oxford OX1 2JD, United Kingdom, Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and matthew.apps@ndcn.ox.ac.uk.
² Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and Neuroimaging Research Branch, Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, Maryland 21224.
³ Department of Psychology, Royal Holloway, University of London, Surrey TW20 0EX, United Kingdom, and.

PMID: 25698730
PMCID: PMC4331622
DOI: 10.1523/JNEUROSCI.3669-14.2015

Abstract

Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors.

Keywords: cingulate; fMRI; prediction error; reinforcement learning; social; teaching.

PubMed Disclaimer

Figures

**Figure 1.**
A, Trial structure. Participants performed trials as a teacher, guiding the associative learning of a student. Each trial began a with a green instruction cue (1 of 10 that the teacher had learnt the associations for during training), followed by the association cue informing the teacher of the correct response for the stimulus. This was displayed in the corner of the teacher's screen. The corresponding corner of the student's screen outside the scanner was covered, such that this cue was shown only to the teacher inside the scanner. Following this, the teacher saw the student's response. They were required to indicate to the student whether this response was correct or incorrect. The teachers indicated their response on a keypad at the time of a screen where a pound coin (correct) or a crossed out pound coin (incorrect) was presented. Participants had to select the corresponding stimulus to deliver to the student. This stimulus was also presented in the corner of the screen, ensuring that the student could not see the teacher's decision at that time. The chosen feedback was delivered to the student at the time of the outcome stimulus. B, Example model data. Plot of the data of the example output from the R-W model. In this example, the learning rate was set to 1 for clarity.

**Figure 2.**
Student PEs in the brain of a teacher. A, Activity shown in the ACC time-locked to the student's response in which activity covaried with the PE parameter from the R-W model on the mean anatomical image. B, Parameter estimates in the peak ACC voxel. Activity in this region correlated only with the PE parameter and not with the student's prediction or the actual value of the outcome. Activity in this region also did not significantly covary with the unsigned PE parameter or a parameter that simply coded for student erroneous responses. Error bars indicate SEM. C, Peristimulus time histogram (PSTH) of activity time-locked to the student's action in the brain of the teacher. Activity plotted for when the student's prediction was erroneously positive (light green triangles) or erroneously negative (dark green circles). The values of the PE were taken from the R-W computational model. Error bars indicate SEM.

**Figure 3.**
Simulating the student prediction. Activity shown in the ventromedial prefrontal cortex (A) and the right short insula gyrus (B) covarying with the predicted value according to the student, taken from the R-W model. Plots of the parameter estimates from the peak voxel in the VmPFC (C) and the insula (D) for the PE, the student predicted value, and the actual value of the outcome known by the teacher. Parameter estimates for the predicted value parameter are for the unique variance explained by the regressor once orthogonalized with respect to the actual outcome parameter. Parameter estimates for the PE parameter and the actual outcome parameter are from regressors that have not been orthogonalized. Error bars indicate SEM. Peristimulus time histogram (PSTH) plots from the VmPFC (E) and the insula (F) time-locked to the student's prediction. Activity in these regions is broken down into low (<0.5) predicted value (light red triangles) versus high (>0.5) predicted value (dark red circles) according to the model. Error bars indicate SEM.

See this image and copyright information in PMC

References

1. Alexander WH, Brown JW. Medial prefrontal cortex as an action–outcome predictor. Nat Neurosci. 2011;14:1338–1344. doi: 10.1038/nn.2921. - DOI - PMC - PubMed
1. Amiez C, Joseph JP, Procyk E. Anterior cingulate error-related activity is modulated by predicted reward. Eur J Neurosci. 2005;21:3447–3452. doi: 10.1111/j.1460-9568.2005.04170.x. - DOI - PMC - PubMed
1. Andersson JL, Hutton C, Ashburner J, Turner R, Friston K. Modeling geometric deformations in EPI time series. Neuroimage. 2001;13:903–919. doi: 10.1006/nimg.2001.0746. - DOI - PubMed
1. Apps MA, Ramnani N. The anterior cingulate gyrus signals the net value of others' rewards. J Neurosci. 2014;34:6190–6200. doi: 10.1523/JNEUROSCI.2701-13.2014. - DOI - PMC - PubMed
1. Apps MA, Balsters JH, Ramnani N. The anterior cingulate cortex: monitoring the outcomes of others' decisions. Soc Neurosci. 2012;7:424–435. doi: 10.1080/17470919.2011.638799. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Vicarious reinforcement learning signals when instructing others

Affiliations

Vicarious reinforcement learning signals when instructing others

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources