. 2017 Apr 4:6:e20944.

doi: 10.7554/eLife.20944.

Rules and mechanisms for efficient two-stage learning in neural circuits

Tiberiu Teşileanu^{1

2}, Bence Ölveczky³, Vijay Balasubramanian^{1

2

4}

Affiliations

¹ Initiative for the Theoretical Sciences, CUNY Graduate Center, New York, United States.
² David Rittenhouse Laboratories, University of Pennsylvania, Philadelphia, United States.
³ Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, United States.
⁴ Theoretische Natuurkunde, Vrije Universiteit Brussel & International Solvay Institutes, Brussels, Belgium.

PMID: 28374674
PMCID: PMC5380437
DOI: 10.7554/eLife.20944

Rules and mechanisms for efficient two-stage learning in neural circuits

Tiberiu Teşileanu et al. Elife. 2017.

. 2017 Apr 4:6:e20944.

doi: 10.7554/eLife.20944.

Authors

Tiberiu Teşileanu^{1

2}, Bence Ölveczky³, Vijay Balasubramanian^{1

2

4}

Affiliations

¹ Initiative for the Theoretical Sciences, CUNY Graduate Center, New York, United States.
² David Rittenhouse Laboratories, University of Pennsylvania, Philadelphia, United States.
³ Department of Organismic and Evolutionary Biology and Center for Brain Science, Harvard University, Cambridge, United States.
⁴ Theoretische Natuurkunde, Vrije Universiteit Brussel & International Solvay Institutes, Brussels, Belgium.

PMID: 28374674
PMCID: PMC5380437
DOI: 10.7554/eLife.20944

Abstract

Trial-and-error learning requires evaluating variable actions and reinforcing successful variants. In songbirds, vocal exploration is induced by LMAN, the output of a basal ganglia-related circuit that also contributes a corrective bias to the vocal output. This bias is gradually consolidated in RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using stochastic gradient descent, we derive how the activity in 'tutor' circuits (e.g., LMAN) should match plasticity mechanisms in 'student' circuits (e.g., RA) to achieve efficient learning. We further describe a reinforcement learning framework through which the tutor can build its teaching signal. We show that mismatches between the tutor signal and the plasticity mechanism can impair learning. Applied to birdsong, our results predict the temporal structure of the corrective bias from LMAN given a plasticity rule in RA. Our framework can be applied predictively to other paired brain areas showing two-stage learning.

Keywords: birdsong; learning theory; motor control; neuroscience; reinforcement learning; zebra finch.

PubMed Disclaimer

Conflict of interest statement

The authors declare that no competing interests exist.

Figures

**Figure 1.. Relation between the song system in zebra finches and our model.**
(A) Diagram of the major brain regions involved in birdsong. (B) Conceptual model inspired by the birdsong system. The line from output to tutor is dashed because the reinforcement signal can reach the tutor either directly or, as in songbirds, indirectly. (C) Plasticity rule measured in bird RA (measurement done in slice). When an HVC burst leads an LMAN burst by about $100 ms$ , the HVC–RA synapse is strengthened, while coincident firing leads to suppression. Figure adapted from Mehaffey and Doupe (2015). (D) Plasticity rule in our model that mimics the Mehaffey and Doupe (2015) rule. **DOI:** http://dx.doi.org/10.7554/eLife.20944.002

**Figure 2.. Schematic representation of our rate-based model.**
(A) Conductor neurons fire precisely-timed bursts, similar to HVC neurons in songbirds. Conductor and tutor activities, $c (t)$ and $g (t)$ , provide excitation to student neurons, which integrate these inputs and respond linearly, with activity $s (t)$ . Student neurons also receive a constant inhibitory input, $x_{inh}$ . The output neurons linearly combine the activities from groups of student neurons using weights $M_{a j}$ . The linearity assumptions were made for mathematical convenience but are not essential for our qualitative results (see Appendix). (B). The conductor–student synaptic weights $W_{i j}$ are updated based on a plasticity rule that depends on two parameters, $α$ and $β$ , and two timescales, $τ_{1}$ and $τ_{2}$ (see Equation (1) and Materials and methods). The tutor signal enters this rule as a deviation from a constant threshold $θ$ . The figure shows how synaptic weights change ( $Δ W$ ) for a student neuron that receives a tutor burst and a conductor burst separated by a short lag. Two different choices of plasticity parameters are illustrated in the case when the threshold $θ = 0$ . (C) The amount of mismatch between the system’s output and the target output is quantified using a loss (error) function. The figure sketches the loss landscape obtained by varying the synaptic weights $W_{i j}$ and calculating the loss function in each case (only two of the weight axes are shown). The blue dot shows the lowest value of the loss function, corresponding to the best match between the motor output and the target, while the orange dot shows the starting point. The dashed line shows how learning would proceed in a gradient descent approach, where the weights change in the direction of steepest descent in the loss landscape. **DOI:** http://dx.doi.org/10.7554/eLife.20944.003

**Figure 3.. Learning with matched or mismatched tutors in rate-based simulations.**
(A) Error trace showing how the average motor error evolved with the number of repetitions of the motor program for a rate-based ( $α = 0$ ) plasticity rule paired with a matching tutor. (See online Video 1). (B) The error trace and final motor output shown for a timing-based learning rule matched by a tutor with a long integration timescale. (See online Video 2.) In both A and B the inset shows the final motor output for one of the two output channels (thick orange line) compared to the target output for that channel (dotted black line). The output on the first rendition and at two other stages of learning indicated by orange arrows on the error trace are also shown as thin orange lines. (C) Effects of mismatch between student and tutor on reproduction accuracy. The heatmap shows the final reproduction error of the motor output after 1000 learning cycles in a rate-based simulation where a student with parameters $α$ , $β$ , $τ_{1}$ , and $τ_{2}$ was paired with a tutor with memory timescale $τ_{tutor}$ . On the $y$ axis, $τ_{1}$ and $τ_{2}$ were kept fixed at $80 ms$ and $40 ms$ , respectively, while $α$ and $β$ were varied (subject to the constraint $α - β = 1$ ; see text). Different choices of $α$ and $β$ lead to different optimal timescales $τ_{tutor}^{*}$ according to Equation (4). The diagonal elements correspond to matched tutor and student, $τ_{tutor} = τ_{tutor}^{*}$ . Note that the color scale is logarithmic. (D) Error evolution curves as a function of the mismatch between student and tutor. Each plot shows how the error in the motor program changed during 1000 learning cycles for the same conditions as those shown in the heatmap. The region shaded in light pink shows simulations where the mismatch between student and tutor led to a deteriorating instead of improving performance during learning. **DOI:** http://dx.doi.org/10.7554/eLife.20944.004

**Figure 4.. Effects of adding a constraint on the tutor firing rate to the simulations.**
(A) Learning was slowed down by the firing rate constraint, but the accuracy of the final rendition stayed the same (inset, shown here for one of two simulated output channels). Here $α = 0$ , $β = - 1$ , and $τ_{tutor} = τ_{tutor}^{*} = 40 ms$ . (See online Video 3.) (B) Sequential learning occurred when the firing rate constraint was imposed on a matched tutor with a long memory scale. The plots show the evolution of the motor output for one of the two channels that were used in the simulation. Here $α = 24$ , $β = 23$ , and $τ_{tutor} = τ_{tutor}^{*} = 1000 ms$ . (See online Video 4.). **DOI:** http://dx.doi.org/10.7554/eLife.20944.007

**Figure 5.. Results from simulations in spiking neural networks.**
(A) Spike patterns recorded from zebra finch RA during song production, for a juvenile (top) and an adult (bottom). Each color corresponds to a single neuron, and the song-aligned spikes for six renditions of the song are shown. Adapted from Ölveczky et al. (2011). (B) Spike patterns from model student neurons in our simulations, for the untrained (top) and trained (bottom) models. The training used $α = 1$ , $β = 0$ , and $τ_{tutor} = 80 ms$ , and ran for 600 iterations of the song. Each model neuron corresponds to a different output channel of the simulation. In this case, the targets for each channel were chosen to roughly approximate the time course observed in the neural recordings. (C) Progression of reproduction error in the spiking simulation as a function of the number of repetitions for the same conditions as in panel B. The inset shows the accuracy of reproduction in the trained model for one of the output channels. (See online Video 5.) (D) Effects of mismatch between student and tutor on reproduction accuracy in the spiking model. The heatmap shows the final reproduction error of the motor output after 1000 learning cycles in a spiking simulation where a student with parameters $α$ , $β$ , $τ_{1}$ , and $τ_{2}$ was paired with a tutor with memory timescale $τ_{tutor}$ . On the $y$ axis, $τ_{1}$ and $τ_{2}$ were kept fixed at $80 ms$ and $40 ms$ , respectively, while $α$ and $β$ were varied (subject to the constraint $α - β = 1$ ; see section "Learning in a rate-based model"). Different choices of $α$ and $β$ lead to different optimal timescales $τ_{tutor}^{*}$ according to Equation (4). The diagonal elements correspond to matched tutor and student, $τ_{tutor} = τ_{tutor}^{*}$ . Note that the color scale is logarithmic. **DOI:** http://dx.doi.org/10.7554/eLife.20944.010

**Figure 6.. Credit assignment and reinforcement learning.**
(A) Effects of credit mis-assignment on learning in a rate-based simulation. Here, the system learned output sequences for two independent channels. The student–output weights $M_{a j}$ were chosen so that the tutor wrongly assigned a fraction of student neurons to an output channel different from the one it actually mapped to. The graph shows how the accuracy of the motor output after 1000 learning steps depended on the fraction of mis-assigned credit. (B) Learning curve and trained motor output (inset) for one of the channels showing two-stage reinforcement-based learning for the memory-less tutor ( $τ_{tutor} = 0$ ). The accuracy of the trained model is as good as in the case where the tutor was assumed to have a perfect model of the student–output relation. However, the speed of learning is reduced. (See online Video 6.) (C) Learning curve and trained motor output (inset) for one of the output channels showing two-stage reinforcement-based learning when the tutor circuit needs to integrate information about the motor error on a certain timescale. Again, learning was slow, but the accuracy of the trained state was unchanged. (See online Video 7.) (D) Evolution of the average number of HVC inputs per RA neuron with learning in a reinforcement example. Synapses were considered pruned if they admitted a current smaller than 1 nA after a pre-synaptic spike in our simulations. **DOI:** http://dx.doi.org/10.7554/eLife.20944.012

**Appendix 1—figure 1.. Robustness of learning.**
(A) Error trace showing how average motor error evolves with repetitions of the motor program for rate-based plasticity paired with a matching tutor, when the student–output mapping has a push-pull architecture. The inset shows the final motor output (thick red line) compared to the target output (dotted black line). The output on the first rendition and at two other stages of learning are also shown. (B) The error trace and final motor output shown for timing-based plasticity matched by a tutor with a long integration timescale. (C) Effects of mismatch between student and tutor on reproduction accuracy when using a push-pull architecture for the student–output mapping. The heatmap shows the final reproduction error of the motor output after 1000 learning cycles when a student with plasticity parameters $α$ and $β$ is paired with a tutor with memory timescale $τ_{tutor}$ . Here $τ_{1} = 80 ms$ and $τ_{2} = 40 ms$ . (D) Error evolution curves as a function of the mismatch between student and tutor. Each plot shows how the error in the motor program changes during 1000 learning cycles for the same conditions as those shown in the heatmap. The region shaded in light pink shows simulations where the mismatch between student and tutor leads to a deteriorating instead of improving performance during learning. (E) Convergence in the rate-based model with a linear-nonlinear controller that uses a sigmoidal nonlinearity. (F) Convergence in the spiking model when inhibition is constant instead of activity-dependent ( $V_{inh} = constant$ ). **DOI:** http://dx.doi.org/10.7554/eLife.20944.016

**Appendix 1—figure 2.. Effect of changing conductor smoothing kernels in the plasticity rule.**
(A) Matrix showing learning accuracy when using different timescales for the student plasticity rule. Each entry in the heatmap shows the average rendition error after 1000 learning steps when pairing a tutor with timescale $τ_{tutor}$ with a non-matched student. Here the kernels are exponential, with timescales $τ_{1} = 20 ms$ , $τ_{2} = 10 ms$ . (B) Evolution of motor error with learning using kernels $\sim e^{- t / τ}$ and $\sim t e^{- t / τ}$ , instead of the two exponentials used in the main text. The tutor signal is as before, Equation (3). The inset shows the final output for the trained model, for one of the two output channels. Learning is as effective and fast as before. **DOI:** http://dx.doi.org/10.7554/eLife.20944.017

**Appendix 1—figure 3.. Learning with arbitrary conductor activity.**
(A). Typical activity of conductor neurons. 20 of the 100 neurons included in the simulation are shown. The activity pattern is chosen so that about 10% of the neurons are active at any given time. The pattern is chosen randomly but is fixed during learning. Each conductor burst lasts $30 ms$ . (B) Convergence curve and final rendition of the motor program (in inset). Learning included two output channels but the final output is shown for only one of them. **DOI:** http://dx.doi.org/10.7554/eLife.20944.018

**Appendix 1—figure 4.. Violin plots showing how the spiking statistics from our simulation compared to the statistics obtained from neural recordings.**
Each violin shows a kernel-density estimate of the distribution that a particular summary statistic had in either several runs of a simulation, or in several recordings from behaving birds. The circle and the box within each violin show the median and the interquartile range. **DOI:** http://dx.doi.org/10.7554/eLife.20944.019

See this image and copyright information in PMC

References

1. Ali F, Otchy TM, Pehlevan C, Fantana AL, Burak Y, Ölveczky BP. The basal ganglia is necessary for learning spectral, but not temporal, features of birdsong. Neuron. 2013;80:494–506. doi: 10.1016/j.neuron.2013.07.049. - DOI - PMC - PubMed
1. Andalman AS, Fee MS. A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors. PNAS. 2009;106:12518–12523. doi: 10.1073/pnas.0903214106. - DOI - PMC - PubMed
1. Basista MJ, Elliott KC, Wu W, Hyson RL, Bertram R, Johnson F. Independent premotor encoding of the sequence and structure of birdsong in avian cortex. Journal of Neuroscience. 2014;34:16821–16834. doi: 10.1523/JNEUROSCI.1940-14.2014. - DOI - PMC - PubMed
1. Canady RA, Burd GD, DeVoogd TJ, Nottebohm F. Effect of testosterone on input received by an identified neuron type of the canary song system: a golgi/electron microscopy/degeneration study. Journal of Neuroscience. 1988;8:3770–3784. - PMC - PubMed
1. Chistiakova M, Bannon NM, Bazhenov M, Volgushev M. Heterosynaptic plasticity: multiple mechanisms and multiple roles. The Neuroscientist : A Review Journal Bringing Neurobiology, Neurology and Psychiatry. 2014;20:483–498. doi: 10.1177/1073858414529829. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rules and mechanisms for efficient two-stage learning in neural circuits

Affiliations

Rules and mechanisms for efficient two-stage learning in neural circuits

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources