Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun;28(6):1280-1292.
doi: 10.1038/s41593-025-01915-4. Epub 2025 Mar 18.

Prospective contingency explains behavior and dopamine signals during associative learning

Affiliations

Prospective contingency explains behavior and dopamine signals during associative learning

Lechen Qian et al. Nat Neurosci. 2025 Jun.

Abstract

Associative learning depends on contingency, the degree to which a stimulus predicts an outcome. Despite its importance, the neural mechanisms linking contingency to behavior remain elusive. In the present study, we examined the dopamine activity in the ventral striatum-a signal implicated in associative learning-in a Pavlovian contingency degradation task in mice. We show that both anticipatory licking and dopamine responses to a conditioned stimulus decreased when additional rewards were delivered uncued, but remained unchanged if additional rewards were cued. These results conflict with contingency-based accounts using a traditional definition of contingency or a new causal learning model (ANCCR), but can be explained by temporal difference (TD) learning models equipped with an appropriate intertrial interval state representation. Recurrent neural networks trained within a TD framework develop state representations akin to our best 'handcrafted' model. Our findings suggest that the TD error can be a measure that describes both contingency and dopaminergic activity.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Population Average Behavior per session.
For all panels: Deg group, n = 11; Conditioning, n = 6; Cued Reward n=12. Error bars are SEM. (a, b, c, d) Bar graphs comparing the average number of licks to Odor A during the first 3s post-stimulus (a) and during ITI (b), latency to lick (c), and fraction correct (d) in the final sessions of phase 1 and phase 2 for Deg, Cond, and CuedRew groups. Error bars represent SEM. Asterisks denote statistical significance: ns p > 0.05, **p < 0.01, paired two-sided Student’s t-test (e) Session-wise variation in anticipatory licking for Odor A trials, broken down into early, middle, and late blocks, for all groups. (f, g, h). Line graphs showing the average number of licks to Odor A (colored) during ITI (g), latency to lick after Odor A and fraction correct in Odor A trials for each session in the Conditioning, Degradation, and Cued Reward phase (i) Anticipatory licking rate in Odor A trials (colored) and in Odor B trials (grey) across multiple phases: Conditioning (Phase I), Degradation (Phase II), Recovery (Phase III), Extinction (Phase IV), and post-Extinction Recovery (Phase V). (j) Anticipatory licking to Odor C develops quickly compared to Odor A, potentially reflecting generalization. (k, l) PSTH showing the average licking response of mice in Deg group (k) and CuedRew group (l) to the various events. The response is time-locked to the odor presentation (time 0). The shaded area indicates the standard error of the mean (SEM).
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Dopamine responses are highly correlated across recording sites.
(a) Averaged dopamine axonal responses to Odor A during rewarded trials for both Deg group and CuedRew group, depicted for Phase I session 5 and Phase II session 10 across all recorded sites. (b) Correlation matrix for averaged dopamine responses to Odor A during rewarded trials, comparing across sites from the Deg groups during sessions 5 and 10. Cosine similarity was calculated by averaging z-scored responses across trials within animals, then across animals and then computing the cosine similarity between each recording site. For some olfactory tubercule recording sites there was no discernable signal and were thus excluded from this analysis. Sample size (n) report per site. (c) Population average dopamine responses to Odor A in rewarded trials across sessions 1 to 10 for both Deg and CuedRew groups, detailing the changes in response through Phase I and Phase II.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Population Average Dopamine Response per session.
(a) Predicted reward response in Odor A trials for trials in which the first lick after reward delivery was recording within 200 ms (green) or between 400 and 800 ms (red). There is a biphasic response pattern in the slow licks, suggesting there may be sensory cues associated with reward delivery that act as conditioned stimuli (b) Three consecutive trials from the same animal in the same session, showing the effect of lick time. The dotted line indicates the first recorded lick after reward delivery. The lick timing has an effect on the height and shape of the response. (c) Mean peak dopamine axonal signal (z-scored) of cue response (orange) and reward response (cyan) in Odor A rewarded trial by sessions for the Deg group (n=8) across multiple phases: Conditioning (Phase I), Degradation (Phase II), Recovery (Phase III), Extinction (Phase IV), and post-Extinction Recovery (Phase V). Except in extinction, only trials in which the first lick was recorded under 250 ms was included in this analysis (d) As in panel A, for unpredicted rewards delivered in the degradation condition. (e) Example in three trials from the same session and animal of the response to unpredicted reward. The dotted line indicates the first recorded lick. (f) Reward responses by session and by group. In the degradation group (n=8), the unpredicted reward elicited greater responses than the reward delivered after Odor A on all sessions (mixed-effect model, p<0.001, within animal comparison). In the Cued Rew group (n=5), the reward delivered after Odor C elicited a greater response than the reward delivered after Odor A on the first session of Cued Rew condition (two sided mixed-effect model, p<0.05, within animal comparison) (g, h, i) Mean peak dopamine axonal signal (z-scored) across sessions for four distinct conditions, represented for various events. (j) Response to Odor C (rewarded) and (k) Odor C (omission), population average per session In all panels, error bars are SEM.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Effect of discount factor on model estimates.
Top: Influence of discount factor (γ) on relative predicted Odor A response relative to Conditioning (a) or relative to unpredicted reward (b), where reward size = 1 for four models presented in Figure 3. Bottom left scale showing discount factor converted to step size (0.2s), other axes use per second discount. Tested range: 0.5–0.975 discount per 0.2s in 0.025 steps. Dotted line indicates discount factor used in main text. Bottom: Effect size of transition probability in Belief-State model. The Belief-State model used assumes a fixed rate of transition (p) from the Wait state to the Pre-state with each timestep. Varying p around the value fitted to the experimental parameters has minimal effect on prediction (note logarithimic scale). If p is assumed extremely high or low than the transition from the Wait state to the Pre-state either happens almost instaneously or not at all, resulting in a single state dominating the ITI and the model behaving like the Cue-Context model.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Microstimuli simulation.
(a) Complete serial compound state spaces assume non-overlapping microstates (b) Microstimuli state space representation assumes each stimuli produces a sequence of microstimuli that diminish in height (diminishing relative contribution) and grow in width (growing temporal imprecision). (c) High σ and low η microstimuli simulations have low temporal precision – each state significantly overlaps with the previous (d) With low temporal precision, microstimuli behaves like the Cue-Context model, reproducing the pattern of results qualitatively but not quantitatively. (e) However, at this level there is no expected decrease in the predicted reward response, as the timing in insufficiently accurate. (f) In the opposite situation, with low σ and high η, microstimuli state representations have higher precision in time (g) However, in this case it suffers from the same issues as the CSC-with-ITI states model, predicting little difference between the Cued Reward and Degradation conditions (h) Microstimuli help explains why the decrease in predicted reward response is not as predicted by CSC models, in which the reward is perfectly predictable (and thus the reward response following 75% reward probability cues should be 25% the magnitude of an unpredicted reward). (i) From figure 2, mean peak dopamine axonal signal for the last session in Phase 1 (Conditioning) and 2 (Degradation and Cued Reward) for both Deg (n=8) and CuedRew (n=5) groups. Error bars represent SEM. ***, P < 0.001 in two-sided mixed-effects model with Tukey HSD posthoc. (j) Predicted reward response following Odor A in session 1 (green) versus session 5 (red). (k) Maximum axonal calcium response for predicted reward response following Odor A in session 1 versus session 5 (n = 13). Normalized by subject to session 1. Error bars represent SEM.
Extended Data Fig. 6 |
Extended Data Fig. 6 |. Comparison of reward and omission responses between experimental data, Belief-State model and value-RNN predictions.
(a) Plots averaged from one representative simulation of Odor A rewarded trial (n = 4,000 simulated trials) for four distinct conditions using the Belief-State model. Graphs are for the corresponding value function of Odor A rewarded trials, with Pre state, ISI state and Wait state annotated. (b) Z-scored DA axonal signals to reward omission and predicted reward following Odor A quantified from the red shaded area. Line graphs (right) shows mean z-scored response over multiple sessions for each condition. Statistical analysis was performed on data from the first and last session of these conditions. Error bars are SEM. ns, P > 0.05; **, P < 0.01, paired t-test. (c) The predictions of the Belief-State model for reward omission and predicted reward (mean, error bars: SD). (d) The experimental data for reward omission and predicted reward (mean, error bars: SEM). ns, P > 0.05; **, P < 0.01; ***, P < 0.001, Welch’s t-test. (e) The predictions of the Value-RNN models for reward omission and predicted reward (mean, error bars: SD). (f) The experimental data, TD error prediction by Belief-State model and Value-RNN model for uncued reward response in Degradation condition. While the Belief-State model captured the downward trend in response magnitude, none of the three statistical tests showed significant changes.
Extended Data Fig. 7 |
Extended Data Fig. 7 |. Methodology for visualizing state space from hidden unit activity.
Illustration for visualizing common state space of RNN models. RNN hidden unit activity was first projected into principal component space, then canonical correlation analysis was used to align between different conditions.
Extended Data Fig. 8 |
Extended Data Fig. 8 |. Outcome-specific contingency degradation explained by Belief-State model and Value-RNN model.
(a) Experimental design of Garr et al., two cues predicted either a liquid or food reward. During degradation, every 20 s the liquid reward was delivered with 50% probability. The ITI length was drawn from an exponential distribution with mean of 4 minutes. (b) Belief-State model design. The Belief-State model was extended to include a second series of ISI substates to reflect the two types of rewarded trials. The model was then independently trained on the liquid reward and food reward. (c) The value-RNN model design – as (b) but replacing the Belief-State model with the value-RNN, using a vector-valued RPE as feedback, with each channel reflecting one of the reward types. (d-f) Summary of predicted RPE responses from Belief-State Model and Value-RNN (vRNN). The RPE was calculated as the absolute difference between the liquid RPE and food RPE. Other readout functions (e.g. weighted sum) produce similar results. Both model predictions match experimental results with degraded (D) cue (panel d) and degraded reward (e) having a reduced dopamine response versus non-degraded (ND). Furthermore, average RPE during ISI (3 seconds after cue on) and ITI (3 seconds before ITI) capture measured experimental trend. Error bars are SEM.
Figure 1
Figure 1
Dynamic changes in lick response to olfactory cues across different phases of Pavlovian contingency learning task. (a) Experimental design. Three groups of mice subjected to four unique conditions of contingency learning. All animals underwent Phases 1 and 2. Deg group additionally underwent Phases 3–5. (b) Trial timing. (c) Trial parameters per condition. In Conditioning, Degradation and Cued Reward, Odor A predicts 75% chance of reward (9 μL water) delivery, Odor B indicates no reward. In Degradation, blank trials were replaced with uncued rewards (75% reward probability). In Cued Reward, these additional rewards were cued by Odor C. In Extinction, no rewards were delivered. (d) PSTH of average licking response of mice in three groups to the onset of Odor A and Odor B from the last session of Phase 1 (session 5) and Phase 2 (session 10). Shaded area is standard error of the mean (SEM). Notably, the decreased licking response during ISI and increased during ITI in Deg group. (green, Cond group, n = 6; orange, Deg group, n = 11; purple, CuedRew group, n = 12 mice). (e) Average lick rate in 3s post-cue (Odor A or B) by session. Error bars represent SEM. (f) Average lick rate in 3s post Odor A in final session of each condition. Asterisks denote statistical significance: ns, P > 0.05; **, P < 0.01, indicating a significant change in licking behavior to Odor A in Deg group across sessions using 2-sided mixed-effects model with Tukey’s HSD post-hoc tests (Cond vs CuedRew, p = 0.77; Cond vs Deg p = 0.0011; CuedRew vs Deg: p = 0.008)
Figure 2
Figure 2
Dopamine axonal activity recordings show different responses to rewarding cues in Degradation and Cued Reward conditions. (a) Configuration of multifiber photometry recordings. (b) Coronal section from one DAT::cre x Ai148 mouse showing multiple VS fiber tracts. Only lNAc data presented in main results. lNAc, Lateral nucleus accumbens; mNAc, Medial NAc; alOT, anterior lateral olfactory tubercle; plOT, posterior lateral OT; amOT, anterior medial OT; pmOT, posterior medial OT. Points overlayed show the aligned placement for all animals (n=13). (c) Heatmap from two mice (mouse 1, left two panels, mouse 2, right panel) illustrating the z-scored dopamine axonal signals in Odor A rewarded trials (rows), aligned to the onset of Odor A for three conditions. (d) Population average z-scored dopamine axonal signals in response to Odor A and water delivery. Shaded areas represent SEM. (e) Mean peak dopamine axonal signal of Odor A response by sessions for the Deg group (orange, n=8) and the CuedRew group (purple, n=5; two-sided mixed-effects model). (f) Mean peak dopamine axonal signal for the last session in Phase 1 (Conditioning) and 2 (Degradation and Cued Reward) for both Deg (n=8) and CuedRew (n=5) groups. In panels E and F: error bars represent SEM. ns, P >0.05; *, P<0.05 ***, P < 0.001 in two-sided mixed-effects model with Tukey HSD posthoc.
Figure 3
Figure 3
TD learning models can explain dopamine responses in contingency degradation with appropriate ITI representation. (a) ¬Temporal Difference Zero, TD(0), model – The state representation determines value. The difference in value between the current and gamma-discounted future state plus the reward determines the reward prediction error or dopamine. This error drives updates in the weights. (b) Belief-State Model: After the ISI, the animal is in the Wait state, transitioning to the pre-transition (‘Pre’) state with fixed probability p. Animal only leaves Pre state following the observation of odor or reward. (c) State representations: from the left, Complete Serial Compound (CSC) with no ITI representation, CSC with ITI states, Cue-Context model and the Belief-State model. (d) Value in Odor A trials of each state representation using TD(0) for Conditioning and Degradation conditions (e) TD error is the difference in value plus the reward. (f) Mean normalized TD error of Odor A response from 25 simulated experiments. Error bars are SD.
Figure 4
Figure 4
Belief-State model, but not Cue-Context model, explains variance in behavior and dopamine responses. (a) Cue-Context model and Belief-State model differ in their representation of the ITI. (b) Odor B predicts no reward and at least 10 s before the start of the next trial. An ideal agent waits this out, only licking late in the ITI. (c) Odor B induces a reduction in licking, particularly in the Degradation condition, which matches the pattern of value in the Belief-State model better than the Cue-Context model. (d) Quantified licks (top) from experimental data in early (3.5–5s) and late (7–8s) post cue period. Error bars are SEM, *, P < 0.05, two-sided paired t-test (Conditioning, p = 0.457, n = 30; Degradation: p = 0.0413; n=11; CuedReward: p = 0.92, n = 13). Value from Cue-Context and Belief-State model for the same time period, error bars are SD. (e) If licking is taken as a readout of value, then ITI licking should be inversely correlated with dopamine. (f) Per animal linear regression of Odor A dopamine response (z-score axonal calcium) on lick rate in 2s before cue delivery in last two sessions of each condition. (g) Summarized slope coefficients from experimental data (left) and models (right). Boxplot shows median and IQR; whisker are 1.5× IQR, one sample two-sided t-test (Conditioning, p = 0.27, n= 13; Degradation: p=0.057, n= 8; Cued Reward: p = 0.070, n = 5)
Figure 5
Figure 5
Belief-State model’s predictions recapitulate additional experimental data. For all experimental summary data, n = 13 (conditioning), n = 8 (degradation), n = 5 (cued reward) and n = 7 (extinction). Error bars are SEM. ns, P >0.05; * P < 0.05, **, P < 0.01, *** P < 0.001. For all model summary n = 25 (all conditions) and error bars are SD. (a) Plots averaged from one representative simulation of Odor A rewarded trial (n = 4,000 simulated trials) for four distinct conditions using the Belief-State model. Graphs are for the corresponding value function (left) and TD error (right) of cue response for Odor A rewarded trials. (b) Signals from dopamine axons (mean) across multiple sessions of each condition (left). Mean peak dopamine axonal calcium signal (z-scored) for the first to last session in Phase 2 for four contingency conditions (right). Two-sided mixed-effects model. P = 0.137 for Cued Reward, P < 0.001 all other comparisons. The Belief-State model captures the modulation of Odor A dopamine response in all conditions. (c) Degradation, Cued Reward and Extinction conditions differ in how their ITI and ISI values change compared to Conditioning phase. (d) Mean peak TD error by Belief-State model and dopamine axonal signal (z-scored) to Odor A for four distinct conditions. The model’s prediction captured well the pattern in the dopamine data. All pairwise difference at P<0.001 are significant using two-sided mixed-effects model with Tukey’s HSD post-hoc test. (e) Averaged traces from a representative simulation of Odor B trial (n = 4000 simulated trials) across four distinct conditions using the Belief-State model. Graphs are for the value function and TD errors of cue response for Odor B trials. (f) Z-scored dopamine axonal signals to Odor B quantified from the red shaded area to quantify the later response only. Bar graph (left) shows mean z-scored Odor B AUC from 0.25s-1s response from the last session of each condition. Two-sided mixed-effects model with Tukey HSD post hoc. (Cond vs Cued Rew: P = 0.007; Cond vs Ext: P = 0.43; Deg vs CuedRew: P = 0.035; CuedRew vs Ext: P = 0.0051, all other P<0.001). Line graph (right) shows mean z-scored AUC over multiple sessions for each condition. Two-sided mixed-effects model, first and last sessions of these conditions (Degradation: P<0.001; CuedRew P = 0.62; Extinction P = 0.74).
Figure 6
Figure 6
Value-RNNs recapitulate experimental results using state-spaces akin to hand-crafted Belief-State model. For all panels, experimental data: Conditioning (n=13), Degradation (n = 8), Cued Reward (n = 5) error bars are SEM, Extinction (n=7); models (n=25 simulations) error bars are SD; (a) The Value-RNN replaces the hand-crafted state space representation with an RNN that is trained only on the observations of cues and rewards. The TD error is used to train the network. (b) RNNs were initially trained on simulated Conditioning experiments, before being retrained on either Degradation or Cued Reward conditions. (c) The asymptotic predictions of the RNN models (mean, error bars: SD, n = 25 simulations, 50-unit RNNs) closely match the experimental results (see Figure 2f, 5f for statistics). * P < 0.05, **, P < 0.01, *** P < 0.001 (d) Example value, TD error, and corresponding average experimental data from a single RNN simulation. Notably, decreased Odor A response is explained by increased value in the pre-cue period. (e) Hidden neuron activity projected into 3D space using CCA from the same RNNs used in (d). The Odor A ISI representation is similar in each of the three conditions, and similar to the Odor C representation. Odor B representation is significantly changed in the Degradation condition. (f) Correspondence between RNN state space and Belief-State model. A linear decoder was trained to predict beliefs using RNN hidden unit activity. With increasing hidden layer size (n=25 each layer size), the RNN becomes increasingly belief-like. The improved performance of the decoder for the Degradation condition is explained by better decoding of the Wait state. Better Wait state decoding is explained by altered ITI representation. (g) Same RNNs as in (d) and (e), hidden unit activity projected into state-space as (e) for the ITI period only reveals ITI representation is significantly different in the Degradation case.
Figure 7
Figure 7
ANCCR does not explain the experimental results: (a) Simplified representation of ANCCR model. Notably the first step is to estimate retrospective contingency using eligibility traces. (b) Simulations of the same virtual experiments (n = 25) used in Figure 3 using ANCCR, using the parameters in Garr et al., 2024 varying the prospective-retrospective weighting parameter (w). Error bars are SD. In all cases the predicted Odor A response is similar in the Degradation and Cued Reward conditions. (c) No parameter combination explains the experimental result. Searching 21,000 parameter combinations across six parameters (T ratio = 0.2–2, α = 0.01–0.3, k = 0.01–1 or 1/(mean inter-reward interval), w =0–1, threshold = 0.1–0.7, αR = 0.1–0.3). Experimental result plotted as a star. Previously used parameters (Garr et al., 2024 as 1, Jeong et al., 2022 as 2 and 3) indicated. Dots are colored by the prospective-retrospective weighting parameter (w), which has a strong effect on the magnitude of Phase 2 response relative to Phase 1. (d) As the contingency is calculated as the first step, and the contingencies are similar in Degradation and Cued Reward conditions, there is little difference in the retrospective contingency representation between the two conditions, explaining why regardless of parameter choice ANCCR predicts similar responses.

Update of

Similar articles

Cited by

References

    1. Rescorla RA Pavlovian conditioning. It’s not what you think it is. Am Psychol 43, 151–160 (1988). - PubMed
    1. Gibbon J, Berryman R & Thompson RL Contingency spaces and measures in classical and instrumental conditioning. J Exp Anal Behav 21, 585–605 (1974). - PMC - PubMed
    1. Hallam SC, Grahame NJ & Miller RR Exploring the edges of Pavlovian contingency space: An assessment of contingency theory and its various metrics. Learning and Motivation 23, 225–249 (1992).
    1. Cheng PW From covariation to causation: A causal power theory. Psychological review 104, 367 (1997).
    1. Gallistel CR, Craig AR & Shahan TA Contingency, contiguity, and causality in conditioning: Applying information theory and Weber’s Law to the assignment of credit problem. Psychol. Rev. 126, 761–773 (2019). - PubMed

Methods-only references

    1. Bäckman CM et al. Characterization of a mouse strain expressing Cre recombinase from the 3’ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006). - PubMed
    1. Daigle TL et al. A suite of transgenic driver and reporter mouse lines with enhanced brain cell type targeting and functionality. Cell 174, 465–480.e22 (2018). - PMC - PubMed
    1. Kim CK et al. Simultaneous fast measurement of circuit dynamics at multiple sites across the mammalian brain. Nat. Methods (2016) doi: 10.1038/nmeth.3770. - DOI - PMC - PubMed
    1. Sabatini BL The impact of reporter kinetics on the interpretation of data gathered with fluorescent reporters. bioRxiv 834895 (2019) doi: 10.1101/834895. - DOI
    1. Gallego JA, Perich MG, Chowdhury RH, Solla SA & Miller LE Long-term stability of cortical population dynamics underlying consistent behavior. Nat Neurosci 23, 260–270 (2020). - PMC - PubMed

Dataset Reference:

    1. Qian L et al. , Code and Data for Qian et al., 2025, Figshare, doi: 10.6084/m9.figshare.28216202, 2025 - DOI

LinkOut - more resources