. 2021 Sep 27;31(18):4111-4119.e4.

doi: 10.1016/j.cub.2021.06.069. Epub 2021 Jul 23.

Reinforcement learning links spontaneous cortical dopamine impulses to reward

Conrad Foo¹, Adrian Lozada¹, Johnatan Aljadeff², Yulong Li³, Jing W Wang², Paul A Slesinger⁴, David Kleinfeld⁵

Affiliations

¹ Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA.
² Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA.
³ Peking University, School of Life Sciences, Peking University, Beijing 100871, P.R. China.
⁴ Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA. Electronic address: paul.slesinger@mssm.edu.
⁵ Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA; Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA. Electronic address: dk@physics.ucsd.edu.

PMID: 34302743
PMCID: PMC8605927
DOI: 10.1016/j.cub.2021.06.069

Reinforcement learning links spontaneous cortical dopamine impulses to reward

Conrad Foo et al. Curr Biol. 2021.

. 2021 Sep 27;31(18):4111-4119.e4.

doi: 10.1016/j.cub.2021.06.069. Epub 2021 Jul 23.

Authors

Conrad Foo¹, Adrian Lozada¹, Johnatan Aljadeff², Yulong Li³, Jing W Wang², Paul A Slesinger⁴, David Kleinfeld⁵

Affiliations

¹ Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA.
² Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA.
³ Peking University, School of Life Sciences, Peking University, Beijing 100871, P.R. China.
⁴ Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA. Electronic address: paul.slesinger@mssm.edu.
⁵ Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA; Section of Neurobiology, University of California at San Diego, La Jolla, CA 92093, USA. Electronic address: dk@physics.ucsd.edu.

PMID: 34302743
PMCID: PMC8605927
DOI: 10.1016/j.cub.2021.06.069

Abstract

In their pioneering study on dopamine release, Romo and Schultz speculated "...that the amount of dopamine released by unmodulated spontaneous impulse activity exerts a tonic, permissive influence on neuronal processes more actively engaged in preparation of self-initiated movements...."¹ Motivated by the suggestion of "spontaneous impulses," as well as by the "ramp up" of dopaminergic neuronal activity that occurs when rodents navigate to a reward,^2-5 we asked two questions. First, are there spontaneous impulses of dopamine that are released in cortex? Using cell-based optical sensors of extrasynaptic dopamine, [DA]_ex,⁶ we found that spontaneous dopamine impulses in cortex of naive mice occur at a rate of ∼0.01 per second. Next, can mice be trained to change the amplitude and/or timing of dopamine events triggered by internal brain dynamics, much as they can change the amplitude and timing of dopamine impulses based on an external cue?^7-9 Using a reinforcement learning paradigm based solely on rewards that were gated by feedback from real-time measurements of [DA]_ex, we found that mice can volitionally modulate their spontaneous [DA]_ex. In particular, by only the second session of daily, hour-long training, mice increased the rate of impulses of [DA]_ex, increased the amplitude of the impulses, and increased their tonic level of [DA]_ex for a reward. Critically, mice learned to reliably elicit [DA]_ex impulses prior to receiving a reward. These effects reversed when the reward was removed. We posit that spontaneous dopamine impulses may serve as a salient cognitive event in behavioral planning.

Keywords: biophysical modeling; brain machine interface; classical conditioning; feedback; foraging; neuromodulation; stochastic dynamics; two-photon microscopy.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1.. Characterization of spontaneous dopamine impulses ([DA]_ex) in naive mice in the absence of reward in an apparatus without a lick port or overt stimulation**
(A) Open-loop measurement of cortical [DA]_ex using D2-CNiFERs with *in vivo* two-photon microscopy. Increases in [DA]_ex are observed as an increase in YFP fluorescence with a concomitant decrease in CFP fluorescence and vice versa. The CNiFER signal is reported as the fractional change in FRET. Inset: fluorescence image of D2 Cni-FER implant in cortex is shown. (B) Time series shows changes in D2 CNiFER FRET, reflecting spontaneous transients in [DA]_ex, i.e., dopamine impulses, along with measurement of the rate of locomotion. (C) The magnitude of the spectral coherencemagnitude, averaged over all animals (xx mice), between the locomotion rate and [DA]_ex. Data without lick port in black and with dry port in maroon are shown; dashed line is 0.95 confidence level. (D) Distribution of spontaneous dopamine impulserate for all naive animals. Histograms without lick port (gray) and with port (maroon) are statistically indistinguishable (KS test; p = 0.99). (E) Two-dimensional histogram of spontaneousimpulses in [DA]_ex across all animals without lick port. Top row is the cumulative of all widths; the average full width half maximum relative to baseline was 15.1 ± 1.3 s (blue line). Right column is the cumulative of amplitudes; 0.1 corresponds to [DA]_ex ~20 nM; the average was 0.056 ± 0.002 (blue line). See also Figure S1.

**Figure 2.. Closed-loop feedback reinforcement training to volitionally link spontaneous impulses in [DA]_ex to reward**
(A) The setup for the open loop experiment (Figure 1A) is augmented. We use the measured D2-CNiFER response as a proxy for [DA]_ex and drive the delivery of a liquid reward based on the [DA]_ex signal (red) relative to an adaptive staircase threshold (green) updated every 0.25 s. If [DA]_ex exceeds the threshold, a drop (0.1 mL) of sucrose water (10% w/v) is released via the lick port and the threshold is incremented by 0.005 signal units. The value of the threshold also exponentially decreases, in discrete steps of 0.005, with a time constant of 225 s, since the last increment. (B) Example data show the open-loop naive response on day 1 and an increase of tonic [DA]_ex with closed-loop reinforcement on day 3; the adaptive staircase threshold follows the reward to within the decay time constant. A 130-s segment of data highlighted by the beige band is expanded. (C) Rolling average of [DA]_ex impulses for all mice. The averaging window was 235 s. On day 2, tonic [DA]_ex did not significantly increase relative to that of naive mice (p = 0.69). By day 3, tonic [DA]_ex increased significantly (p = 0.01). The increase extinguished when reward was withheld on day 4 (p = 0.73) and reinstated when reward was restored on day 5 or 7 (p = 0.02) compared to feedback withheld on day 6 or 8.

**Figure 3.. Population-averaged changes in [DA]_ex over the course of our reinforcement paradigm**
(A) Tonic [DA]_ex shows a significant increase relative to naive, baseline animals after feedback training (day 3; p = 0.006); the increase is Δ[DA]_ex = 0.088. Day 2 and day 4 showed no significant change in tonic [DA]_ex, p = 0.37 and 0.66, respectively. Randomly rewarded mice also did not show a significant change in tonic [DA]_ex compared to naive animals (p = 0.61). (B) The amplitude of [DA]_ex impulses significantly increased over the course of the trial with feedback training (day 3; p = 0.01). With feedback OFF, there was a significant decrease in amplitude over the course of the trial (day 4; p = 0.004). The data for day 1 and day 2 and the case of random reward showed no significant change in [DA]_ex impulse amplitude over time; p = 0.43, p = 0.77, and p = 0.67, respectively. (C) The rate of [DA]_ex impulses was significantly higher relative to baseline animals with feedback training (day 3; p = 0.001). Day 2 and day 4 showed no significant change in [DA]_ex impulse frequency; p = 0.65 and 0.37, respectively. The rate for animals with random reward was significantly less than that for naive animals (p = 10⁸). Alternating bars along time axis indicate intervals for binned data in (A)–(C). (D) Distribution of amplitudes and widths of [DA]_ex impulses for trained mice with feedback ON (day 3; top), a subset of rewarded impulses when feedback is ON (day 3; center), and when feedback reinforcement is OFF (day 4; bottom). The amplitudes with feedback training ON were significantly larger compared to baseline animals (day 3; p = 10⁻⁷) and compared to feedback OFF (day 4; p = 10⁻²⁹). The widths were significantly greater with feedback ON compared to those for baseline animals (p = 10⁻²²) and when feedback was OFF (p = 10⁻¹⁵). Rewarded impulses had an amplitude of 0.078 ± 0.002 and an average width of 44.2 ± 1.6 s and were significantly greater than the impulses when feedback was OFF (day 4; p = 10⁻⁸ and p = 10⁻²⁹, respectively). (E) Standard boxplot shows the timing of the onset of [DA]_ex impulses relative to the onset of licking; the solid bars are the median times. The mean and standard error of the lag times of [DA]_ex impulses relative to licking are on +11.0 ± 3.5 s on day 1, +0.6 ± 0.9 s on day 2 allowing for a single outlier, −5.1 ± 1.4 s on day 3 allowing for a single outlier, with [DA]_ex now leading, and +22.8 ± 11.9 s on day 4. Impulses also lagged licking for randomly rewarded mice (+10.9 ± 7.9 s). A two-sample t test with respect to baseline animals yields p = 0.04 (day 2), p = 0.005 (day 3), p = 0.9 (day 4), and p = 0.3 (random); including the single outliers drops the evidence against a null hypothesis to p = 0.6 (day 2) and the still highly significant value p = 0.02 (day 3). The inset shows a scatterplot of the number of rewarded impulses versus lag time for individual mice on day 2 and day 3, together with a fit of the model (Equations 6 and 7 in STAR Methods) to the data. (F) Standard boxplot shows the variance explained by an optimal linear filter that predicts licking from the measured [DA]_ex. “+” indicates an outlier that was not included in the mean. Solid bars indicate median. The values of R² for days 2 and 3 are significantly different than zero, with p = 0.030 and p = 0.037. The data for random reward, although very small at 0.019, are also significant (p = 0.027; n = 12) as a result of the large sample. Note 6-fold increase on 2^nd day of feedback ON (day 3). See also Figures S2 and S3.

See this image and copyright information in PMC

Cited by

Closed-loop experiments and brain machine interfaces with multiphoton microscopy.
Hira R. Hira R. Neurophotonics. 2024 Jul;11(3):033405. doi: 10.1117/1.NPh.11.3.033405. Epub 2024 Feb 19. Neurophotonics. 2024. PMID: 38375331 Free PMC article. Review.
Genetically encoded sensors illuminate in vivo detection for neurotransmission: Development, application, and optimization strategies.
Zhong X, Gu H, Lim J, Zhang P, Wang G, Zhang K, Li X. Zhong X, et al. IBRO Neurosci Rep. 2025 Mar 13;18:476-490. doi: 10.1016/j.ibneur.2025.03.003. eCollection 2025 Jun. IBRO Neurosci Rep. 2025. PMID: 40177704 Free PMC article.
Three Water Restriction Schedules Used in Rodent Behavioral Tasks Transiently Impair Growth and Differentially Evoke a Stress Hormone Response without Causing Dehydration.
Vasilev D, Havel D, Liebscher S, Slesiona-Kuenzel S, Logothetis NK, Schenke-Layland K, Totah NK. Vasilev D, et al. eNeuro. 2021 Dec 14;8(6):ENEURO.0424-21.2021. doi: 10.1523/ENEURO.0424-21.2021. Print 2021 Nov-Dec. eNeuro. 2021. PMID: 34815297 Free PMC article.
Probing Neuropeptide Volume Transmission In Vivo by Simultaneous Near-Infrared Light-Triggered Release and Optical Sensing.
Xiong H, Lacin E, Ouyang H, Naik A, Xu X, Xie C, Youn J, Wilson BA, Kumar K, Kern T, Aisenberg E, Kircher D, Li X, Zasadzinski JA, Mateo C, Kleinfeld D, Hrabetova S, Slesinger PA, Qin Z. Xiong H, et al. Angew Chem Int Ed Engl. 2022 Aug 22;61(34):e202206122. doi: 10.1002/anie.202206122. Epub 2022 Jul 8. Angew Chem Int Ed Engl. 2022. PMID: 35723610 Free PMC article.

References

1. Romo R, and Schultz W. (1990). Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63, 592–606. - PubMed
1. Phillips PEM, Stuber GD, Heien MLAV, Wightman RM, and Carelli RM (2003). Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618. - PubMed
1. Collins AL, Greenfield VY, Bye JK, Linker KE, Wang AS, and Wassum KM (2016). Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci. Rep. 6, 20231. - PMC - PubMed
1. Howe MW, Tierney PL, Sandberg SG, Phillips PEM, and Graybiel AM (2013). Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579. - PMC - PubMed
1. Hamid AA, Pettibone JR, Mabrouk OS, Hetrick VL, Schmidt R, Vander Weele CM, Kennedy RT, Aragona BJ, and Berke JD (2016). Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reinforcement learning links spontaneous cortical dopamine impulses to reward

Affiliations

Reinforcement learning links spontaneous cortical dopamine impulses to reward

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources