Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun;642(8068):682-690.
doi: 10.1038/s41586-025-08929-9. Epub 2025 Jun 4.

Multi-timescale reinforcement learning in the brain

Affiliations

Multi-timescale reinforcement learning in the brain

Paul Masset et al. Nature. 2025 Jun.

Abstract

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behaviour can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2-5 and at characterizing the firing of dopaminergic neurons in the midbrain6-8. In classical reinforcement learning, agents discount future rewards exponentially according to a single timescale, known as the discount factor. Here we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopaminergic neurons in mice performing two behavioural tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks, suggesting that it is a cell-specific property. Together, our results provide a new paradigm for understanding functional heterogeneity in dopaminergic neurons and a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations9-12, and open new avenues for the design of more-efficient reinforcement learning algorithms.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Update of

References

    1. Sutton, R. S. & Barto, A. G. Reinforcement Learning 2nd edn (MIT Press, 2018).
    1. Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995). - DOI
    1. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). - PubMed - DOI
    1. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). - PubMed - DOI
    1. Wurman, P. R. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022). - PubMed - DOI

LinkOut - more resources