Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Mar;18(1):23-32.
doi: 10.31887/DCNS.2016.18.1/wschultz.

Dopamine reward prediction error coding

Affiliations
Review

Dopamine reward prediction error coding

Wolfram Schultz. Dialogues Clin Neurosci. 2016 Mar.

Abstract

Reward prediction errors consist of the differences between received and predicted rewards. They are crucial for basic forms of learning about rewards and make us strive for more rewards-an evolutionary beneficial trait. Most dopamine neurons in the midbrain of humans, monkeys, and rodents signal a reward prediction error; they are activated by more reward than predicted (positive prediction error), remain at baseline activity for fully predicted rewards, and show depressed activity with less reward than predicted (negative prediction error). The dopamine signal increases nonlinearly with reward value and codes formal economic utility. Drugs of addiction generate, hijack, and amplify the dopamine reward signal and induce exaggerated, uncontrolled dopamine effects on neuronal plasticity. The striatum, amygdala, and frontal cortex also show reward prediction error coding, but only in subpopulations of neurons. Thus, the important concept of reward prediction errors is implemented in neuronal hardware.

Los errores en la predicción de la recompensa se deben a las diferencias entre las recompensas recibidas y predichas. Ellos son cruciales para las formas básicas de aprendizaje acerca de las recompensas y nos hacen esforzarnos por más recompensas, lo que constituye un rasgo evolucionario beneficioso. La mayoría de las neuronas dopaminérgicas en el mesencéfalo de los humanos, monos y roedores dan información sobre el error en la predicción de la recompensa; ellas son activadas por más recompensa que la predicha (error de predicción positivo); se mantienen en una actividad basal para el total de las recompensas predichas, y muestran una actividad disminuida con menos recompensa que la predicha (error de predicción negativo). La señal de dopamina aumenta de forma no lineal con el valor de la recompensa y da claves sobre la utilidad económica formal. Las drogas adictivas generan, secuestran y amplifican la señal de recompensa dopaminérgica e inducen efectos dopaminérgicos exagerados y descontrolados en la plasticidad neuronal. El estriado, la amígdala y la corteza frontal también codifican errores en la predicción de la recompensa, pero solo en ciertas subpoblaciones de neuronas. Por lo tanto, el concepto importante de errores en la predicción de recompensa está implementado en el hardware neuronal.

Les erreurs de prédiction de la récompense consistent en différences entre la récompense reçue et celle prévue. Elles sont déterminates pour les formes basiques d'apprentissage concernant la récompense et nous font lutter pour plus de récompense, une caractéristique bénéfique de l'évolution. La plupart des neurones dopaminergiques du mésencéphale des humains, des singes et des rongeurs indiquent une erreur de prédiction de la récompense ; ils sont activés par plus de récompense que prévu (erreur de prédiction positive), restent dans l'activité initiale pour une récompense complètement prévue et montrent une activité diminuée en cas de moins de récompense que prévu (erreur de prédiction négative). Le signal dopaminergique augmente de façon non linéaire avec la récompense et code I'utilité économique formelle. Les médicaments addictifs génèrent, détournent et amplifient le signal de la récompense dopaminergique et induisent des effets dopaminergiques exagérés et non controlés sur la plasticité neuronale. Le striatum, I'amygdale et le cortex frontal manifestent aussi le codage erroné de la prédiction de la récompense, mais seulement dans des sous-populations de neurones. L'important concept d'erreurs de prédiction de la récompense est donc mis en œuvre dans le matériel neuronal.

Keywords: dopamine; neuro-physiology; neuron; prediction; reward; striatum; substantia nigra; ventral tegmental area.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Scheme of learning by prediction error. Red: a prediction error exists when the reward differs from its prediction. Blue: no error exists when the outcome matches the prediction, and the behavior remains unchanged.
Figure 2.
Figure 2.. Reward prediction error responses at the time of reward (right) and reward-predicting visual stimuli (left in bottom two graphs). The dopamine neuron is activated by the unpredicted reward eliciting a positive reward prediction error (blue, + error, top), shows no response to the fully predicted reward eliciting no prediction error (0 error, middle), and is depressed by the omission of predicted reward eliciting a negative prediction error (- error, bottom). Reproduced from ref 8: Schultz W, Dayan P, Montague RR. A neural substrate of prediction and reward. Science 1997;275:1593-1599. Copyright © American Association for the Advancement of Science 1997
Figure 3.
Figure 3.. (A) Stepwise transfer of dopamine response from reward to first reward-predicting stimulus. CS2: earliest reward-predicting stimulus, CS1: subsequent reward-predicting stimulus. From Schultz 12. (B) Reward prediction error responses in sequential stimulus-reward task (gray bars) closely parallel prediction errors of formal temporal difference (TD) reinforcement model (black lines). Averaged population responses of 26 dopamine neurons follow the reward probabilities at each sequence step (numbers indicate reward probabilities in %) and match the time course of TD prediction errors. Reproduced from ref 16: Enomoto K, Matsumoto N, Nakai S, et al. Dopamine neurons learn to encode the long-term value of multiple future rewards. Proc Natl Acad Sci USA. 2011;108:15462-15467. Copyright © National Academy of Sciences 2011
Figure 4.
Figure 4.. Schematic of the two phasic dopamine response components. The initial component (blue) detects the event before having identified its value. It increases with sensory impact (physical salience), generalization to rewarded stimuli, reward context and novelty (novelty/surprise salience). The second component (red) codes reward value (as reward utility prediction error). The two components become more distinct with more demanding stimuli. Adapted from data graph in ref 17: Fiorillo CD, Song MR, Yun SR. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to a ppetitive and aversive stimuli. J Neurosci. 2013;33:4710-4725. Copyright © Society for Neuroscience 2013
Figure 5.
Figure 5.. (A) Top: testing risky rewards: an animal chooses between a safe reward whose amount is adjusted by the experimenter (left) and a fixed, binary, equiprobable gamble (right). Height of each bar indicates juice volume; two bars indicate occurrence of each indicated amount with P=0.5 (risky reward). Bottom: Two psychophysical assessments of risk attitude in a monkey. CE indicates amount of safe reward at choice indifference against gamble (certainty equivalent). EV = expected value of gamble. A CE > EV suggests risk seeking (subjective gamble value CE exceeds objective gamble value EV, left), CE < EV indicates risk avoidance (right). (B) Positive utility prediction error responses to unpredicted juice rewards. Red: utility function derived from binary, equiprobable gambles. Black: corresponding, nonlinear increase of population response (n=14 dopamine neurons) in the same animal. A and B reproduced from ref 25: Stauffer WR, Lak A, Schultz W. Dopamine reward prediction error responses reflect marginal utility. Curr Biol. 2014;24:2491-2500. Copyright © Cell Press 2014

References

    1. Pavlov PI. Conditioned Reflexes. London, UK: Oxford University Press: 1927
    1. Thorndike EL. Animal Intelligence: Experimental Studies. New York, NY: MacMillan: 1911
    1. Rescorla RA., Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, eds. Classical Conditioning II: Current Research and Theory. New York, NY: Appleton Century Crofts: 1972:64–99.
    1. Olds J., Milner P. Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. J Comp Physiol Psychol. 1954;47:419–427. - PubMed
    1. Corbett D., Wise RA. Intracranial self-stimulation in relation to the ascending dopaminergic systems of the midbrain: A moveable microelectrode study. Brain Res. 1980;185:1–15. - PubMed

LinkOut - more resources