Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 17;11(1):3625.
doi: 10.1038/s41467-020-17236-y.

A solution to the learning dilemma for recurrent networks of spiking neurons

Affiliations

A solution to the learning dilemma for recurrent networks of spiking neurons

Guillaume Bellec et al. Nat Commun. .

Abstract

Recurrently connected networks of spiking neurons underlie the astounding information processing capabilities of the brain. Yet in spite of extensive research, how they can learn through synaptic plasticity to carry out complex network computations remains unclear. We argue that two pieces of this puzzle were provided by experimental data from neuroscience. A mathematical result tells us how these pieces need to be combined to enable biologically plausible online network learning through gradient descent, in particular deep reinforcement learning. This learning method-called e-prop-approaches the performance of backpropagation through time (BPTT), the best-known method for training recurrent neural networks in machine learning. In addition, it suggests a method for powerful on-chip learning in energy-efficient spike-based hardware for artificial intelligence.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1. Schemes for BPTT and e-prop.
a RSNN with network inputs x, neuron spikes z, hidden neuron states h, and output targets y*, for each time step t of the RSNN computation. Output neurons y provide a low-pass filter of a weighted sum of network spikes z. b BPTT computes gradients in the unrolled version of the network. It has a new copy of the neurons of the RSNN for each time step t. A synaptic connection from neuron i to neuron j of the RSNN is replaced by an array of feedforward connections, one for each time step t, which goes from the copy of neuron i in the layer for time step t to a copy of neuron j in the layer for time step t  + 1. All synapses in this array have the same weight: the weight of this synaptic connection in the RSNN. c Loss gradients of BPTT are propagated backwards in time and retrograde across synapses in an offline manner, long after the forward computation has passed a layer. d Online learning dynamics of e-prop. Feedforward computation of eligibility traces is indicated in blue. These are combined with online learning signals according to Eq. (1).
Fig. 2
Fig. 2. Comparison of BPTT and e-prop for learning phoneme recognition.
a Network architecture for e-prop, illustrated for an LSNN consisting of LIF and ALIF neurons. b Input and target output for the two versions of TIMIT. c Performance of BPTT and symmetric e-prop for LSNNs consisting of 800 neurons for framewise targets and 2400 for sequence targets (random and adaptive e-prop produced similar results, see Supplementary Fig. 2). To obtain the Global learning signal baselines, the neuron-specific feedbacks are replaced with global ones.
Fig. 3
Fig. 3. Solving a task with difficult temporal credit assignment.
a Setup of corresponding rodent experiments of ref. and ref. , see Supplmentary Movie 1. b Input spikes, spiking activity of 10 out of 50 sample LIF neurons and 10 out of 50 sample ALIF neurons, membrane potentials (more precisely: vjtAjt) for two sample neurons j, three samples of slow components of eligibility traces, sample learning signals for 10 neurons and softmax network output. c Learning curves for BPTT and two e-prop versions applied to LSNNs, and BPTT applied to an RSNN without adapting neurons (red curve). Orange curve shows learning performance of e-prop for a sparsely connected LSNN, consisting of excitatory and inhibitory neurons (Dale's law obeyed). The shaded areas are the 95% confidence intervals of the mean accuracy computed with 20 runs. d Correlation between the randomly drawn broadcast weights Bjk for k =  left/right for learning signals in random e-prop and resulting sensitivity to left and right input components after learning. fj,left (fj,right) was the resulting average firing rate of neuron j during presentation of left (right) cues after learning.
Fig. 4
Fig. 4. Application of e-prop to the Atari game Pong.
a Here, the player (green paddle) has to outplay the opponent (light brown). A reward is acquired when the opponent cannot bounce back the ball (a small white square). To achieve this, the agent has to learn to hit the ball also with the edges of his paddle, which causes a less predictable trajectory. b The agent is realized by an LSNN. The pixels of the current video frame of the game are provided as input. During processing of the stream of video frames by the LSNN, actions are generated by the stochastic policy in an online manner. At the same time, future rewards are predicted. The current error in prediction is fed back both to the LSNN and the spiking CNN that preprocesses the frames. c Sample trial of the LSNN after learning with reward-based e-prop. From top to bottom: probabilities of stochastic actions, prediction of future rewards, learning dynamics of a random synapse (arbitrary units), spiking activity of 10 out of 240 sample LIF neurons and 10 out of 160 sample ALIF neurons, and membrane potentials (more precisely: vjtAjt) for the two sample neurons j at the bottom of the spike raster above. d Learning progress of the LSNN trained with reward-based e-prop, reported as the sum of collected rewards during an episode. The learning curve is averaged over five different runs and the shaded area represents the standard deviation. More information about the comparison between our results and A3C are given in Supplementary Note 5.
Fig. 5
Fig. 5. Application of e-prop to learning to win the Atari game Fishing Derby.
a Here the player has to compete against an opponent, and try to catch more fish from the sea. b Once a fish has bit, the agent has to avoid that the fish gets touched by a shark. c Sample trial of the trained network. From top to bottom: probabilities of stochastic actions, prediction of future rewards, learning dynamics of a random synapse (arbitrary units), spiking activity of 20 out of 180 sample LIF neurons and 20 out of 120 sample ALIF neurons. d Learning curves of an LSNN trained with reward-based e-prop as in Fig. 4d.
Fig. 6
Fig. 6. Computational graph and gradient propagations.
a Assumed mathematical dependencies between hidden neuron states hjt, neuron outputs zt, network inputs xt, and the loss function E through the mathematical functions E( ⋅ ), M( ⋅ ), f( ⋅ ) are represented by colored arrows. bd The flow of computation for the two components et and Lt that merge into the loss gradients of Eq. (3) can be represented in similar graphs. b Following Eq. (14), the flow of the computation of the eligibility traces ejit is going forward in time. c Instead, the ideal learning signals Ljt=dEdzjt requires to propagate gradients backward in time. d Hence, while ejit is computed exactly, Ljt is approximated in e-prop applications to yield an online learning algorithm.

References

    1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015). - PubMed
    1. Allen Institute: Cell Types Database. © 2018 Allen Institute for Brain Science. Allen Cell Types Database, cell feature search. Available from: celltypes.brain-map.org/data (2018).
    1. Bellec, G., Salaj, D., Subramoney, A., Legenstein, R. & Maass, W. Long short-term memory and learning-to-learn in networks of spiking neurons. 32nd Conference on Neural Information Processing Systems (2018).
    1. Huh, D. & Sejnowski, T. J. Gradient descent for spiking neural networks. 32nd Conference on Neural Information Processing Systems (2018).
    1. Lillicrap, T. P. & Santoro, A. Backpropagation through time and the brain. Curr. Opin. Neurobiol.55, 82–89 (2019). - PubMed

Publication types