Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 18;35(7):2904-13.
doi: 10.1523/JNEUROSCI.3669-14.2015.

Vicarious reinforcement learning signals when instructing others

Affiliations

Vicarious reinforcement learning signals when instructing others

Matthew A J Apps et al. J Neurosci. .

Abstract

Reinforcement learning (RL) theory posits that learning is driven by discrepancies between the predicted and actual outcomes of actions (prediction errors [PEs]). In social environments, learning is often guided by similar RL mechanisms. For example, teachers monitor the actions of students and provide feedback to them. This feedback evokes PEs in students that guide their learning. We report the first study that investigates the neural mechanisms that underpin RL signals in the brain of a teacher. Neurons in the anterior cingulate cortex (ACC) signal PEs when learning from the outcomes of one's own actions but also signal information when outcomes are received by others. Does a teacher's ACC signal PEs when monitoring a student's learning? Using fMRI, we studied brain activity in human subjects (teachers) as they taught a confederate (student) action-outcome associations by providing positive or negative feedback. We examined activity time-locked to the students' responses, when teachers infer student predictions and know actual outcomes. We fitted a RL-based computational model to the behavior of the student to characterize their learning, and examined whether a teacher's ACC signals when a student's predictions are wrong. In line with our hypothesis, activity in the teacher's ACC covaried with the PE values in the model. Additionally, activity in the teacher's insula and ventromedial prefrontal cortex covaried with the predicted value according to the student. Our findings highlight that the ACC signals PEs vicariously for others' erroneous predictions, when monitoring and instructing their learning. These results suggest that RL mechanisms, processed vicariously, may underpin and facilitate teaching behaviors.

Keywords: cingulate; fMRI; prediction error; reinforcement learning; social; teaching.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A, Trial structure. Participants performed trials as a teacher, guiding the associative learning of a student. Each trial began a with a green instruction cue (1 of 10 that the teacher had learnt the associations for during training), followed by the association cue informing the teacher of the correct response for the stimulus. This was displayed in the corner of the teacher's screen. The corresponding corner of the student's screen outside the scanner was covered, such that this cue was shown only to the teacher inside the scanner. Following this, the teacher saw the student's response. They were required to indicate to the student whether this response was correct or incorrect. The teachers indicated their response on a keypad at the time of a screen where a pound coin (correct) or a crossed out pound coin (incorrect) was presented. Participants had to select the corresponding stimulus to deliver to the student. This stimulus was also presented in the corner of the screen, ensuring that the student could not see the teacher's decision at that time. The chosen feedback was delivered to the student at the time of the outcome stimulus. B, Example model data. Plot of the data of the example output from the R-W model. In this example, the learning rate was set to 1 for clarity.
Figure 2.
Figure 2.
Student PEs in the brain of a teacher. A, Activity shown in the ACC time-locked to the student's response in which activity covaried with the PE parameter from the R-W model on the mean anatomical image. B, Parameter estimates in the peak ACC voxel. Activity in this region correlated only with the PE parameter and not with the student's prediction or the actual value of the outcome. Activity in this region also did not significantly covary with the unsigned PE parameter or a parameter that simply coded for student erroneous responses. Error bars indicate SEM. C, Peristimulus time histogram (PSTH) of activity time-locked to the student's action in the brain of the teacher. Activity plotted for when the student's prediction was erroneously positive (light green triangles) or erroneously negative (dark green circles). The values of the PE were taken from the R-W computational model. Error bars indicate SEM.
Figure 3.
Figure 3.
Simulating the student prediction. Activity shown in the ventromedial prefrontal cortex (A) and the right short insula gyrus (B) covarying with the predicted value according to the student, taken from the R-W model. Plots of the parameter estimates from the peak voxel in the VmPFC (C) and the insula (D) for the PE, the student predicted value, and the actual value of the outcome known by the teacher. Parameter estimates for the predicted value parameter are for the unique variance explained by the regressor once orthogonalized with respect to the actual outcome parameter. Parameter estimates for the PE parameter and the actual outcome parameter are from regressors that have not been orthogonalized. Error bars indicate SEM. Peristimulus time histogram (PSTH) plots from the VmPFC (E) and the insula (F) time-locked to the student's prediction. Activity in these regions is broken down into low (<0.5) predicted value (light red triangles) versus high (>0.5) predicted value (dark red circles) according to the model. Error bars indicate SEM.

Similar articles

Cited by

References

    1. Alexander WH, Brown JW. Medial prefrontal cortex as an action–outcome predictor. Nat Neurosci. 2011;14:1338–1344. doi: 10.1038/nn.2921. - DOI - PMC - PubMed
    1. Amiez C, Joseph JP, Procyk E. Anterior cingulate error-related activity is modulated by predicted reward. Eur J Neurosci. 2005;21:3447–3452. doi: 10.1111/j.1460-9568.2005.04170.x. - DOI - PMC - PubMed
    1. Andersson JL, Hutton C, Ashburner J, Turner R, Friston K. Modeling geometric deformations in EPI time series. Neuroimage. 2001;13:903–919. doi: 10.1006/nimg.2001.0746. - DOI - PubMed
    1. Apps MA, Ramnani N. The anterior cingulate gyrus signals the net value of others' rewards. J Neurosci. 2014;34:6190–6200. doi: 10.1523/JNEUROSCI.2701-13.2014. - DOI - PMC - PubMed
    1. Apps MA, Balsters JH, Ramnani N. The anterior cingulate cortex: monitoring the outcomes of others' decisions. Soc Neurosci. 2012;7:424–435. doi: 10.1080/17470919.2011.638799. - DOI - PubMed

Publication types

LinkOut - more resources