Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 27;31(18):4111-4119.e4.
doi: 10.1016/j.cub.2021.06.069. Epub 2021 Jul 23.

Reinforcement learning links spontaneous cortical dopamine impulses to reward

Affiliations

Reinforcement learning links spontaneous cortical dopamine impulses to reward

Conrad Foo et al. Curr Biol. .

Abstract

In their pioneering study on dopamine release, Romo and Schultz speculated "...that the amount of dopamine released by unmodulated spontaneous impulse activity exerts a tonic, permissive influence on neuronal processes more actively engaged in preparation of self-initiated movements...."1 Motivated by the suggestion of "spontaneous impulses," as well as by the "ramp up" of dopaminergic neuronal activity that occurs when rodents navigate to a reward,2-5 we asked two questions. First, are there spontaneous impulses of dopamine that are released in cortex? Using cell-based optical sensors of extrasynaptic dopamine, [DA]ex,6 we found that spontaneous dopamine impulses in cortex of naive mice occur at a rate of ∼0.01 per second. Next, can mice be trained to change the amplitude and/or timing of dopamine events triggered by internal brain dynamics, much as they can change the amplitude and timing of dopamine impulses based on an external cue?7-9 Using a reinforcement learning paradigm based solely on rewards that were gated by feedback from real-time measurements of [DA]ex, we found that mice can volitionally modulate their spontaneous [DA]ex. In particular, by only the second session of daily, hour-long training, mice increased the rate of impulses of [DA]ex, increased the amplitude of the impulses, and increased their tonic level of [DA]ex for a reward. Critically, mice learned to reliably elicit [DA]ex impulses prior to receiving a reward. These effects reversed when the reward was removed. We posit that spontaneous dopamine impulses may serve as a salient cognitive event in behavioral planning.

Keywords: biophysical modeling; brain machine interface; classical conditioning; feedback; foraging; neuromodulation; stochastic dynamics; two-photon microscopy.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Characterization of spontaneous dopamine impulses ([DA]ex) in naive mice in the absence of reward in an apparatus without a lick port or overt stimulation
(A) Open-loop measurement of cortical [DA]ex using D2-CNiFERs with in vivo two-photon microscopy. Increases in [DA]ex are observed as an increase in YFP fluorescence with a concomitant decrease in CFP fluorescence and vice versa. The CNiFER signal is reported as the fractional change in FRET. Inset: fluorescence image of D2 Cni-FER implant in cortex is shown. (B) Time series shows changes in D2 CNiFER FRET, reflecting spontaneous transients in [DA]ex, i.e., dopamine impulses, along with measurement of the rate of locomotion. (C) The magnitude of the spectral coherencemagnitude, averaged over all animals (xx mice), between the locomotion rate and [DA]ex. Data without lick port in black and with dry port in maroon are shown; dashed line is 0.95 confidence level. (D) Distribution of spontaneous dopamine impulserate for all naive animals. Histograms without lick port (gray) and with port (maroon) are statistically indistinguishable (KS test; p = 0.99). (E) Two-dimensional histogram of spontaneousimpulses in [DA]ex across all animals without lick port. Top row is the cumulative of all widths; the average full width half maximum relative to baseline was 15.1 ± 1.3 s (blue line). Right column is the cumulative of amplitudes; 0.1 corresponds to [DA]ex ~20 nM; the average was 0.056 ± 0.002 (blue line). See also Figure S1.
Figure 2.
Figure 2.. Closed-loop feedback reinforcement training to volitionally link spontaneous impulses in [DA]ex to reward
(A) The setup for the open loop experiment (Figure 1A) is augmented. We use the measured D2-CNiFER response as a proxy for [DA]ex and drive the delivery of a liquid reward based on the [DA]ex signal (red) relative to an adaptive staircase threshold (green) updated every 0.25 s. If [DA]ex exceeds the threshold, a drop (0.1 mL) of sucrose water (10% w/v) is released via the lick port and the threshold is incremented by 0.005 signal units. The value of the threshold also exponentially decreases, in discrete steps of 0.005, with a time constant of 225 s, since the last increment. (B) Example data show the open-loop naive response on day 1 and an increase of tonic [DA]ex with closed-loop reinforcement on day 3; the adaptive staircase threshold follows the reward to within the decay time constant. A 130-s segment of data highlighted by the beige band is expanded. (C) Rolling average of [DA]ex impulses for all mice. The averaging window was 235 s. On day 2, tonic [DA]ex did not significantly increase relative to that of naive mice (p = 0.69). By day 3, tonic [DA]ex increased significantly (p = 0.01). The increase extinguished when reward was withheld on day 4 (p = 0.73) and reinstated when reward was restored on day 5 or 7 (p = 0.02) compared to feedback withheld on day 6 or 8.
Figure 3.
Figure 3.. Population-averaged changes in [DA]ex over the course of our reinforcement paradigm
(A) Tonic [DA]ex shows a significant increase relative to naive, baseline animals after feedback training (day 3; p = 0.006); the increase is Δ[DA]ex = 0.088. Day 2 and day 4 showed no significant change in tonic [DA]ex, p = 0.37 and 0.66, respectively. Randomly rewarded mice also did not show a significant change in tonic [DA]ex compared to naive animals (p = 0.61). (B) The amplitude of [DA]ex impulses significantly increased over the course of the trial with feedback training (day 3; p = 0.01). With feedback OFF, there was a significant decrease in amplitude over the course of the trial (day 4; p = 0.004). The data for day 1 and day 2 and the case of random reward showed no significant change in [DA]ex impulse amplitude over time; p = 0.43, p = 0.77, and p = 0.67, respectively. (C) The rate of [DA]ex impulses was significantly higher relative to baseline animals with feedback training (day 3; p = 0.001). Day 2 and day 4 showed no significant change in [DA]ex impulse frequency; p = 0.65 and 0.37, respectively. The rate for animals with random reward was significantly less than that for naive animals (p = 108). Alternating bars along time axis indicate intervals for binned data in (A)–(C). (D) Distribution of amplitudes and widths of [DA]ex impulses for trained mice with feedback ON (day 3; top), a subset of rewarded impulses when feedback is ON (day 3; center), and when feedback reinforcement is OFF (day 4; bottom). The amplitudes with feedback training ON were significantly larger compared to baseline animals (day 3; p = 10−7) and compared to feedback OFF (day 4; p = 10−29). The widths were significantly greater with feedback ON compared to those for baseline animals (p = 10−22) and when feedback was OFF (p = 10−15). Rewarded impulses had an amplitude of 0.078 ± 0.002 and an average width of 44.2 ± 1.6 s and were significantly greater than the impulses when feedback was OFF (day 4; p = 10−8 and p = 10−29, respectively). (E) Standard boxplot shows the timing of the onset of [DA]ex impulses relative to the onset of licking; the solid bars are the median times. The mean and standard error of the lag times of [DA]ex impulses relative to licking are on +11.0 ± 3.5 s on day 1, +0.6 ± 0.9 s on day 2 allowing for a single outlier, −5.1 ± 1.4 s on day 3 allowing for a single outlier, with [DA]ex now leading, and +22.8 ± 11.9 s on day 4. Impulses also lagged licking for randomly rewarded mice (+10.9 ± 7.9 s). A two-sample t test with respect to baseline animals yields p = 0.04 (day 2), p = 0.005 (day 3), p = 0.9 (day 4), and p = 0.3 (random); including the single outliers drops the evidence against a null hypothesis to p = 0.6 (day 2) and the still highly significant value p = 0.02 (day 3). The inset shows a scatterplot of the number of rewarded impulses versus lag time for individual mice on day 2 and day 3, together with a fit of the model (Equations 6 and 7 in STAR Methods) to the data. (F) Standard boxplot shows the variance explained by an optimal linear filter that predicts licking from the measured [DA]ex. “+” indicates an outlier that was not included in the mean. Solid bars indicate median. The values of R2 for days 2 and 3 are significantly different than zero, with p = 0.030 and p = 0.037. The data for random reward, although very small at 0.019, are also significant (p = 0.027; n = 12) as a result of the large sample. Note 6-fold increase on 2nd day of feedback ON (day 3). See also Figures S2 and S3.

Similar articles

Cited by

References

    1. Romo R, and Schultz W. (1990). Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63, 592–606. - PubMed
    1. Phillips PEM, Stuber GD, Heien MLAV, Wightman RM, and Carelli RM (2003). Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618. - PubMed
    1. Collins AL, Greenfield VY, Bye JK, Linker KE, Wang AS, and Wassum KM (2016). Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci. Rep. 6, 20231. - PMC - PubMed
    1. Howe MW, Tierney PL, Sandberg SG, Phillips PEM, and Graybiel AM (2013). Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579. - PMC - PubMed
    1. Hamid AA, Pettibone JR, Mabrouk OS, Hetrick VL, Schmidt R, Vander Weele CM, Kennedy RT, Aragona BJ, and Berke JD (2016). Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126. - PMC - PubMed

Publication types