The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
- PMID: 28018206
- PMCID: PMC5156839
- DOI: 10.3389/fnsyn.2016.00037
The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces
Abstract
The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be different than those for artificial systems. Two major problems encountered by RL are how to relate a stimulus with a reinforcing signal that is delayed in time (temporal credit assignment), and how to stop learning once the target behaviors are attained (stopping rule). To address the first problem synaptic eligibility traces were introduced, bridging the temporal gap between a stimulus and its reward. Although, these were mere theoretical constructs, recent experiments have provided evidence of their existence. These experiments also reveal that the presence of specific neuromodulators converts the traces into changes in synaptic efficacy. A mechanistic implementation of the stopping rule usually assumes the inhibition of the reward nucleus; however, recent experimental results have shown that learning terminates at the appropriate network state even in setups where the reward nucleus cannot be inhibited. In an effort to describe a learning rule that solves the temporal credit assignment problem and implements a biologically plausible stopping rule, we proposed a model based on two separate synaptic eligibility traces, one for long-term potentiation (LTP) and one for long-term depression (LTD), each obeying different dynamics and having different effective magnitudes. The model has been shown to successfully generate stable learning in recurrent networks. Although, the model assumes the presence of a single neuromodulator, evidence indicates that there are different neuromodulators for expressing the different traces. What could be the role of different neuromodulators for expressing the LTP and LTD traces? Here we expand on our previous model to include several neuromodulators, and illustrate through various examples how different these contribute to learning reward-timing within a wide set of training paradigms and propose further roles that multiple neuromodulators can play in encoding additional information of the rewarding signal.
Keywords: LTD; LTP; eligibility-trace; neuromodulator; reinforcement-learning; reward; synaptic plasticity; timing.
Figures







Similar articles
-
Distinct Eligibility Traces for LTP and LTD in Cortical Synapses.Neuron. 2015 Nov 4;88(3):528-38. doi: 10.1016/j.neuron.2015.09.037. Epub 2015 Oct 22. Neuron. 2015. PMID: 26593091 Free PMC article.
-
Eligibility traces as a synaptic substrate for learning.Curr Opin Neurobiol. 2025 Apr;91:102978. doi: 10.1016/j.conb.2025.102978. Epub 2025 Feb 17. Curr Opin Neurobiol. 2025. PMID: 39965463 Review.
-
Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules.Front Neural Circuits. 2018 Jul 31;12:53. doi: 10.3389/fncir.2018.00053. eCollection 2018. Front Neural Circuits. 2018. PMID: 30108488 Free PMC article. Review.
-
Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.Neural Comput. 2007 Jun;19(6):1468-502. doi: 10.1162/neco.2007.19.6.1468. Neural Comput. 2007. PMID: 17444757
-
One-shot learning and behavioral eligibility traces in sequential decision making.Elife. 2019 Nov 11;8:e47463. doi: 10.7554/eLife.47463. Elife. 2019. PMID: 31709980 Free PMC article.
Cited by
-
Astrocyte D1/D5 Dopamine Receptors Govern Non-Hebbian Long-Term Potentiation at Sensory Synapses onto Lamina I Spinoparabrachial Neurons.J Neurosci. 2024 Aug 7;44(32):e0170242024. doi: 10.1523/JNEUROSCI.0170-24.2024. J Neurosci. 2024. PMID: 38955487 Free PMC article.
-
Norepinephrine potentiates and serotonin depresses visual cortical responses by transforming eligibility traces.Nat Commun. 2022 Jun 9;13(1):3202. doi: 10.1038/s41467-022-30827-1. Nat Commun. 2022. PMID: 35680879 Free PMC article.
-
Dopamine and serotonin interplay for valence-based spatial learning.Cell Rep. 2022 Apr 12;39(2):110645. doi: 10.1016/j.celrep.2022.110645. Cell Rep. 2022. PMID: 35417691 Free PMC article.
-
Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time.Nat Commun. 2024 Jul 12;15(1):5856. doi: 10.1038/s41467-024-50205-3. Nat Commun. 2024. PMID: 38997276 Free PMC article.
-
Learning precise spatiotemporal sequences via biophysically realistic learning rules in a modular, spiking network.Elife. 2021 Mar 18;10:e63751. doi: 10.7554/eLife.63751. Elife. 2021. PMID: 33734085 Free PMC article.
References
-
- Beitel R. E., Schreiner C. E., Cheung S. W., Wang X., Merzenich M. M. (2003). Reward-dependent plasticity in the primary auditory cortex of adult monkeys trained to discriminate temporally modulated signals. Proc. Natl. Acad. Sci. U.S.A. 100, 11070–11075. 10.1073/pnas.1334187100 - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous