. 2018 Dec 6;8(6):20180033.

doi: 10.1098/rsfs.2018.0033. Epub 2018 Oct 19.

From statistical inference to a differential learning rule for stochastic neural networks

Luca Saglietti^{1

2}, Federica Gerace^{2

3}, Alessandro Ingrosso⁴, Carlo Baldassi^{2

5

6}, Riccardo Zecchina^{2

5

7}

Affiliations

¹ Microsoft Research New England, Cambridge, MA, USA.
² Italian Institute for Genomic Medicine, Torino, Italy.
³ Politecnico di Torino, DISAT, Torino, Italy.
⁴ Center for Theoretical Neuroscience, Columbia University, New York, USA.
⁵ Bocconi Institute for Data Science and Analytics, Bocconi University, Milano, Italy.
⁶ Istituto Nazionale di Fisica Nucleare, Torino, Italy.
⁷ International Centre for Theoretical Physics, Trieste, Italy.

PMID: 30443331
PMCID: PMC6227809
DOI: 10.1098/rsfs.2018.0033

From statistical inference to a differential learning rule for stochastic neural networks

Luca Saglietti et al. Interface Focus. 2018.

. 2018 Dec 6;8(6):20180033.

doi: 10.1098/rsfs.2018.0033. Epub 2018 Oct 19.

Authors

Luca Saglietti^{1

2}, Federica Gerace^{2

3}, Alessandro Ingrosso⁴, Carlo Baldassi^{2

5

6}, Riccardo Zecchina^{2

5

7}

Affiliations

¹ Microsoft Research New England, Cambridge, MA, USA.
² Italian Institute for Genomic Medicine, Torino, Italy.
³ Politecnico di Torino, DISAT, Torino, Italy.
⁴ Center for Theoretical Neuroscience, Columbia University, New York, USA.
⁵ Bocconi Institute for Data Science and Analytics, Bocconi University, Milano, Italy.
⁶ Istituto Nazionale di Fisica Nucleare, Torino, Italy.
⁷ International Centre for Theoretical Physics, Trieste, Italy.

PMID: 30443331
PMCID: PMC6227809
DOI: 10.1098/rsfs.2018.0033

Abstract

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our delayed-correlations matching (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale's principle and asymmetry of synaptic connections, locality of the weight update computations. Nevertheless, the DCM rule is capable of storing a large, extensive number of patterns as attractors in a stochastic recurrent neural network, under general scenarios without requiring any modification: it can deal with correlated patterns, a broad range of architectures (with or without hidden neuronal states), one-shot learning with the palimpsest property, all the while avoiding the proliferation of spurious attractors. When hidden units are present, our learning rule can be employed to construct Boltzmann machine-like generative models, exploiting the addition of hidden neurons in feature extraction and classification tasks.

Keywords: associative memory; attractor networks; learning.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

**Figure 1.**
DCM learning protocol scheme. This represents the learning process for one pattern presentation. The blue curve shows the stepwise dynamics of the external field as a function of the time t of the network dynamics. The first time period ( time steps, shaded) serves for initializing the network state in the proximity of the pattern. The protocol then proceeds in windows of 2T steps, each one divided in two phases. In the middle of each window, the field intensity drops by . The time-delayed correlations are recorded separately for the two phases. The parameters are updated at the end of each window, in correspondence of the symbols, according to equation (2.6). (Online version in colour.)

formula image — **Figure 1.**
DCM learning protocol scheme. This represents the learning process for one pattern presentation. The blue curve shows the stepwise dynamics of the external field as a function of the time t of the network dynamics. The first time period ( time steps, shaded) serves for initializing the network state in the proximity of the pattern. The protocol then proceeds in windows of 2T steps, each one divided in two phases. In the middle of each window, the field intensity drops by . The time-delayed correlations are recorded separately for the two phases. The parameters are updated at the end of each window, in correspondence of the symbols, according to equation (2.6). (Online version in colour.)

**Figure 2.**
Maximum storage load as a function of the width of the basin of attraction for a network of N = 400 visible neurons. The red and blue curves show the results for the Hopfield model and the DCM rule, respectively. The noise level operatively measures the width of the basins of attraction (it is the fraction of corrupted bits that the network is able to correct, see the text for a more precise definition). Each curve is an average over 10 samples (error bars are smaller than point size). The inverse temperature parameter is set to in order to fall within the retrieval phase of the Hopfield model. The critical capacity at zero temperature is lower than the Gardner bound, , because of the stochastic component of the dynamics. (Online version in colour.)

**Figure 3.**
Maximum storage load as a function of the field intensity for a network of N = 400 neurons. The correlations were recorded in windows of T = 20 time steps and the field intensity step was . The noise level in the retrieval phase is set to and the temperature to . The curve was obtained by averaging over 100 samples. The inset shows a comparison between the recurrent and external components of the inputs, for the same data points of the main panel. The mean recurrent input was computed as the 2-norm of the mean values. This shows that the DCM rule is effective even for relatively small stimuli. (Online version in colour.)

**Figure 4.**
Required learning cycles as a function of the storage load, for unconstrained and constrained synapses for networks of size N = 200 (dashed curves) and N = 400 (full curves). The results for the case of unconstrained synapses (blue curves) and that of synapses satisfying Dale’s principle (red curves) are compared. Here the chosen inhibitory scheme is the soft ‘winner takes all’ mechanism. The noise level in the retrieval phase was set to , while the sparsity was fixed at in order to avoid finite size effects with the relatively small networks. The curves are interrupted at the value of α where the algorithm starts failing. (Online version in colour.)

**Figure 5.**
Maximum storage load as a function of the bias in the distribution of the patterns for networks of size N = 200 (dashed curves) and N = 400 (full curves). The correlation is introduced trivially: each pattern is built by extracting spins from a biased distribution . The blue curves show the scaling properties of the capacity of the DCM rule as a function of the bias. The drop in the performance for small biases is due to finite size effects, and the performance improves with N. The red and green curves show the results for the naive Hebb rule and the generalized Hebb rule adapted to the biased case, respectively (see the electronic supplementary material, section *IV D*). For larger N, the capacity for all unbalanced cases is expected to drop to 0. All the curves were obtained by averaging over 10 samples (error bars are smaller than the point size). (Online version in colour.)

**Figure 6.**
Maximum storage load as a function of the length of the dictionary of features. We study the critical capacity of the generalized Hebb rule (red curve) and the DCM rule (blue curve) when the patterns are generated as combinations of features, chosen from a dictionary of varying length L. In the inset, the mean Pearson correlation in a dataset of 200 patterns is shown as a function of the dictionary length. In the numerical experiments, every feature had a fixed sparsity of f = 0.1 and each pattern was obtained as a superposition of F = 6 features (see the electronic supplementary material, section *IV D*). The curves were obtained by averaging over 10 samples (error bars are smaller than the point size). (Online version in colour.)

**Figure 7.**
Number of spurious attractors for a network of N = 400 neurons. This figure shows the number of distinct spurious attractors found during 10000 independent random walks, of 200 time-steps, after a small number of patterns were learned by the network (see the electronic supplementary material, section *IV B*). The red curve represents the Hebb rule (the first peak is due to finite size effects). The blue curve shows the behaviour of the DCM rule. The curves were obtained by averaging over 10 samples (error bars are standard errors). (Online version in colour.)

**Figure 8.**
Scaling properties of the palimpsest capacity. In this figure, we show the results obtained when testing the DCM learning rule in the context of one-shot learning, for the case of neurons. The full curves show the results for N = 100, 200, 400 and 800, illustrating the scaling properties of the palimpsest capacity. The dashed grey curves are extrapolated as the mean of the last three measurements. All the points are obtained by averaging over 10 samples. (Online version in colour.)

See this image and copyright information in PMC

References

1. Deco G, Rolls ET, Romo R. 2009. Stochastic dynamics as a principle of brain function. Prog. Neurobiol. 88, 1–16. (10.1016/j.pneurobio.2009.01.006) - DOI - PubMed
1. Cannon RC, O’Donnell C, Nolan MF. 2010. Stochastic ion channel gating in dendritic neurons: morphology dependence and probabilistic synaptic activation of dendritic spikes. PLoS Comput. Biol. 6, e1000886 (10.1371/journal.pcbi.1000886) - DOI - PMC - PubMed
1. Flight MH. 2008. Synaptic transmission: on the probability of release. Nat. Rev. Neurosci. 9, 736–737.
1. Azouz R, Gray CM. 1999. Cellular mechanisms contributing to response variability of cortical neurons in vivo. J. Neurosci. 19, 2209–2223. (10.1523/JNEUROSCI.19-06-02209.1999) - DOI - PMC - PubMed
1. Gerstner W, Kistler WM. 2002. Spiking neuron models: single neurons, populations, plasticity. Cambridge, UK: Cambridge University Press.

Associated data

figshare/10.6084/m9.figshare.c.4252316

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

From statistical inference to a differential learning rule for stochastic neural networks

Affiliations

From statistical inference to a differential learning rule for stochastic neural networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Associated data

LinkOut - more resources

Full Text Sources