Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 7;32(3):586-599.e7.
doi: 10.1016/j.cub.2021.12.006. Epub 2021 Dec 21.

Serotonin neurons modulate learning rate through uncertainty

Affiliations

Serotonin neurons modulate learning rate through uncertainty

Cooper D Grossman et al. Curr Biol. .

Abstract

Regulating how fast to learn is critical for flexible behavior. Learning about the consequences of actions should be slow in stable environments, but accelerate when that environment changes. Recognizing stability and detecting change are difficult in environments with noisy relationships between actions and outcomes. Under these conditions, theories propose that uncertainty can be used to modulate learning rates ("meta-learning"). We show that mice behaving in a dynamic foraging task exhibit choice behavior that varied as a function of two forms of uncertainty estimated from a meta-learning model. The activity of dorsal raphe serotonin neurons tracked both types of uncertainty in the foraging task as well as in a dynamic Pavlovian task. Reversible inhibition of serotonin neurons in the foraging task reproduced changes in learning predicted by a simulated lesion of meta-learning in the model. We thus provide a quantitative link between serotonin neuron activity, learning, and decision making.

Keywords: decision making; dorsal raphe; learning; serotonin; uncertainty.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Mice forage dynamically for rewards
(A) Dynamic foraging task in which mice chose freely between a leftward and rightward lick, followed by a reward with a probability that varied over time. (B) Example mouse behavior from a single session in the task. Black (rewarded) and gray (unrewarded) ticks correspond to left (below) and right (above) choices. Black curve: mouse choices (smoothed over 5 trials, boxcar filter). Blue curve: Rewards (smoothed over 5 trials, boxcar filter). Blue dots indicate left/right reward probabilities, and dashed lines indicate a change in reward probability (P(R)) for at least one spout. (C) Logistic regression coefficients for choice as a function of outcome history. Error bars: 95% CI. See also Figure S1.
Figure 2.
Figure 2.. Mice learn at variable rates as a function of outcome history
(A) Schematic of the meta-learning model algorithm. Relative value (QrQl) is used to make choices through a softmax decision function. The predicted value of a choice(Qc) is compared with reward (R) to generate a reward prediction error (δ). Expected uncertainty (ε) is a recent, weighted history of |δ|. ε is compared with |δ| on a given trial to generate unexpected uncertainty (v). On no-reward trials, v is then integrated to determine how rapidly to learn from δ, thereby updating Qc. (B) Estimated choice probability of actual behavior (black, same as Figure 1B) and choice probability estimated with the meta-learning model (green) smoothed over 5 trials (boxcar filter). (C) Spout licks following no reward as a function of |δ| from the static learning model (left, regression coefficient = 0.45, p < 10−20) or v from the meta-learning model (right, regression coefficient = 0.56, p < 10−20). (D) Left: Actual mouse behavior at transitions in which reward probabilities changed simultaneously(n = 384 high-low to low-high transitions, n = 347 medium-low to low-high transitions). Lines are mean choice probability relative to the spout that initially had the higher probability. Shading is Bernoulli SEM. Middle: Simulated behavior at transitions using static learning model parameters fit to actual behavior. Right: Simulated behavior at transitions using meta-learning model parameters fit to actual behavior. (E) Time constants from exponential curves fit to simulated choice probabilities (like those shown in B) for each mouse (n = 48, green circles) compared with the actual mouse behavior (black circle). Left: Static-learning model (probability that mouse data come from simulated data distribution, p < 10−4). Right: Meta-learning model (p = 0.51). (F) Left: Actual mouse behavior using transitions from (D) in which the animal exclusively chose the previously high or previously medium spout for 10 trials prior to the transition. Transitions were sorted into low (n = 98) and high (n = 288) reward history experienced during those 10 trials. Middle: Simulated behavior from the static learning model. Right: Simulated behavior from the meta-learning model. (G) Time constants from exponential fits to actual (black circles) and simulated (green circles) behavior for the static (p < 10−13) and meta-learning (p = 0.38) models. See also Figures S2 and S3.
Figure 3.
Figure 3.. Serotonin neuron firing rates respond to observable variables
(A) Schematic of electrophysiological recording of identified serotonin neurons. (B) Example “tagging” of a serotonin neuron, using channelrhodopsin-2 stimulation. (C) Left: Choice and outcome probabilities for an example session, as in Figure 1B. Right: Action potential raster plots for an example neuron from that session aligned to the go cue (conditioned stimulus [CS]). Each row is a single trial aligned to the go cue. (D) Mean firing rates during go cue and inter-trial interval for individual neurons (48 of 66 with significant increases and 14 of 66 with significant decreases, paired t tests). (E) Mean firing rates during the outcome period (1 s after second lick) for individual neurons (13 of 66 with significantly higher responses to rewards and 30 of 66 with significantly higher responses to no rewards, two-sample t tests). (F) Heatmap of Z-scored firing rates for all serotonin neurons, aligned to go cue, for each of the choice-outcome contingencies. (G) Rate of significant coefficients from linear regressions of firing rates (500 ms bins) on observable variables at each time point (100 ms steps) before, during, and after the trial. (H) The Z-scored inter-trial interval firing rates from the example neuron in (C) plotted as functions of model variables. There was a significant negative correlation with ε (blue asterisk), but not with other variables.
Figure 4.
Figure 4.. Serotonin neuron firing rates correlate with expected uncertainty on slow and fast timescales
(A) Action potential raster plots for an example neuron with a significant correlation with expected uncertainty during the go cue aligned to cue onset (left) and outcome (second lick, right) and ordered by increasing ε. (B) Activity of the example neuron in (A) averaged within terciles (increasing values of ε represented by darker hues) of ε and aligned to the go cue (CS, gray rectangle) and outcome. (C) The t-statistics across all neurons from a linear regression, modeling firing rates during the inter-trial interval as a function of ε(t). Blue bars indicate neurons with significant regression coefficients. (D) Population Z-scored firing rates plotted as a function of ε(t). Inset shows population split by positive and negative correlations. Main plot combines these neurons by “sign-flipping” positively correlated firing rates (also used in E and F). Pie chart shows ratio of significant neurons (blue). (E) Within-trial dynamics of expected uncertainty (ε(t), ε(t + 1), top row) aligned to go cue (CS, left column) and outcome (right column) across all significant neurons. Scale bar, 0.5 Z score. Gray curve: Response time (RT) distribution (cut off at 1 s). (F) The Z-scored firing rates of serotonin neurons split by e(f) tercile. Scale bar, 0.5 Z score. (G) Example dynamics of ε(t) estimated from behavior and neuronal firing rates. (H) Log-log plot of the expected uncertainty update rate (ψ) from the firing rate model for each neuron and from the behavioral model derived from simultaneous choice behavior. See also Figure S4.
Figure 5.
Figure 5.. Serotonin neuron firing rates correlate with unexpected uncertainty on fast timescales
(A) Action potential raster plots for an example neuron with a significant correlation with unexpected uncertainty during the outcome aligned to cue onset (left) and outcome (second lick, right) and ordered by increasing v. (B) Activity of the example neuron in (A) averaged within terciles (increasing values of v represented by lighter hues) of v and aligned to the go cue (gray rectangle, left, v(t − 1)) and outcome (dashed lined, right, v(t)). (C) Within-trial dynamics of unexpected uncertainty (v(t − 1), v(t)) aligned to go cue (CS, left column) and outcome (right column) for all significant neurons (pooled by “sign-flipping” negatively correlated firing rates, also used in D). Scale bar, 0.5 Z score. Gray curve: RT distribution (cut off at 1 s). (D) Population Z-scored firing rates plotted as a function of v(t). (E) The t-statistics from linear regressions of outcome firing rates on v(t) and CS firing rates on ε(t) for all identified serotonin neurons.
Figure 6.
Figure 6.. Serotonin neuron firing rates correlate with expected and unexpected uncertainty in a dynamic Pavlovian task
(A) Schematic of Pavlovian task in which the probability of reward (P(R)) varied over trials. (B) Example behavior showing anticipatory licking, in the delay before outcome, as P(R) varied. Black ticks: Rewarded trials. Gray ticks: Unrewarded trials. (C) Linear regression coefficients of licking rate on reward history. (D) Two example neurons showing negative correlations between inter-trial interval firing rates and expected uncertainty (−ε is plotted) when the monotonic trends are regressed out. Scale bars, 1 Z score, 50 trials. (E) Example serotonin neuron showing a negative correlation between CS firing rates and expected uncertainty (ε(t)). Top: Firing rates averaged within terciles (represented by hue) of E and aligned to the CS (left, ε(t)) and outcome (right, ε(t +1)). Bottom: Action potential raster plots aligned to cue onset (left) and outcome (second lick, right) and ordered by increasing E. (F) The t-statistics from linear regression, modeling inter-trial interval firing rate as a function of ε(t) as in Figure 3F. (G) Population “tuning curves,” as in Figure 3G. (H) Stable firing rates within inter-trial intervals, as in Figure 3I. Scale bar, 0.5 Z score. (I) Within-trial, Z-scored firing rates as a function of uncertainty as in Figures 4E and 5C. Scale bar, 0.5 Z score. See also Figure S5.
Figure 7.
Figure 7.. Serotonin neuron inhibition disrupts meta-learning
(A) Schematic of experiment to reversibly inactivate serotonin neurons and representative expression of hM4Di-mCherry in dorsal raphe serotonin neurons. (B) Schematic of simulated lesion in which models were fit to mouse behavior from vehicle sessions and then meta-learning variables (i.e., ε and v) were set to zero. (C) Simulated behavior with meta-learning intact, fit to vehicle behavior (left) and simulated lesion (right). (D) Mouse behavior with vehicle injections (control experiment) and drug (agonist 21). Lines are mean choice probability and shading is Bernoulli SEM. (E) Exponential time constants for transitions from simulated behavior and simulated lesions. (F) Time constants from mice (with 95% CI). (G) Simulated behavior from mice expressing mCherry in serotonin neurons with vehicle (left) and agonist 21 (right) injections. (H) Mouse behavior with vehicle injections and drug (agonist 21). (I) Simulation time constants from fluorophore-control mice. (J) Time constants from fluorophore-control mice (with 95% CI). See also Figure S6.

References

    1. Bertsekas DP, and Tsitsiklis JN (1996). Neuro-Dynamic Programming (Athena Scientific).
    1. Sutton RS, and Barto AG (1998). Reinforcement Learning: An Introduction (MIT Press; ).
    1. Amari S (1967). A theory of adaptive pattern classifiers. IEEE Trans. Electron. Comput EC-16, 299–307.
    1. Sutton RS (1992). Adapting bias by gradient descent: An incremental version of delta-bar-delta (AAAI), pp. 171–176.
    1. Doya K (2002). Metalearning and neuromodulation. Neural Netw. 15, 495–506. - PubMed

Publication types

LinkOut - more resources