Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 6;8(6):20180033.
doi: 10.1098/rsfs.2018.0033. Epub 2018 Oct 19.

From statistical inference to a differential learning rule for stochastic neural networks

Affiliations

From statistical inference to a differential learning rule for stochastic neural networks

Luca Saglietti et al. Interface Focus. .

Abstract

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our delayed-correlations matching (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale's principle and asymmetry of synaptic connections, locality of the weight update computations. Nevertheless, the DCM rule is capable of storing a large, extensive number of patterns as attractors in a stochastic recurrent neural network, under general scenarios without requiring any modification: it can deal with correlated patterns, a broad range of architectures (with or without hidden neuronal states), one-shot learning with the palimpsest property, all the while avoiding the proliferation of spurious attractors. When hidden units are present, our learning rule can be employed to construct Boltzmann machine-like generative models, exploiting the addition of hidden neurons in feature extraction and classification tasks.

Keywords: associative memory; attractor networks; learning.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

Figure 1.
Figure 1.
DCM learning protocol scheme. This represents the learning process for one pattern presentation. The blue curve shows the stepwise dynamics of the external field formula image as a function of the time t of the network dynamics. The first time period (formula image time steps, shaded) serves for initializing the network state in the proximity of the pattern. The protocol then proceeds in windows of 2T steps, each one divided in two phases. In the middle of each window, the field intensity drops by formula image. The time-delayed correlations are recorded separately for the two phases. The parameters are updated at the end of each window, in correspondence of the formula image symbols, according to equation (2.6). (Online version in colour.)
Figure 2.
Figure 2.
Maximum storage load as a function of the width of the basin of attraction for a network of N = 400 visible neurons. The red and blue curves show the results for the Hopfield model and the DCM rule, respectively. The noise level formula image operatively measures the width of the basins of attraction (it is the fraction of corrupted bits that the network is able to correct, see the text for a more precise definition). Each curve is an average over 10 samples (error bars are smaller than point size). The inverse temperature parameter is set to formula image in order to fall within the retrieval phase of the Hopfield model. The critical capacity at zero temperature is lower than the Gardner bound, formula image, because of the stochastic component of the dynamics. (Online version in colour.)
Figure 3.
Figure 3.
Maximum storage load as a function of the field intensity for a network of N = 400 neurons. The correlations were recorded in windows of T = 20 time steps and the field intensity step was formula image. The noise level in the retrieval phase is set to formula image and the temperature to formula image. The curve was obtained by averaging over 100 samples. The inset shows a comparison between the recurrent and external components of the inputs, for the same data points of the main panel. The mean recurrent input was computed as the 2-norm of the mean values. This shows that the DCM rule is effective even for relatively small stimuli. (Online version in colour.)
Figure 4.
Figure 4.
Required learning cycles as a function of the storage load, for unconstrained and constrained synapses for networks of size N = 200 (dashed curves) and N = 400 (full curves). The results for the case of unconstrained synapses (blue curves) and that of synapses satisfying Dale’s principle (red curves) are compared. Here the chosen inhibitory scheme is the soft ‘winner takes all’ mechanism. The noise level in the retrieval phase was set to formula image, while the sparsity was fixed at formula image in order to avoid finite size effects with the relatively small networks. The curves are interrupted at the value of α where the algorithm starts failing. (Online version in colour.)
Figure 5.
Figure 5.
Maximum storage load as a function of the bias in the distribution of the patterns for networks of size N = 200 (dashed curves) and N = 400 (full curves). The correlation is introduced trivially: each pattern is built by extracting spins from a biased distribution formula image. The blue curves show the scaling properties of the capacity of the DCM rule as a function of the bias. The drop in the performance for small biases is due to finite size effects, and the performance improves with N. The red and green curves show the results for the naive Hebb rule and the generalized Hebb rule adapted to the biased case, respectively (see the electronic supplementary material, section IV D). For larger N, the capacity for all unbalanced cases is expected to drop to 0. All the curves were obtained by averaging over 10 samples (error bars are smaller than the point size). (Online version in colour.)
Figure 6.
Figure 6.
Maximum storage load as a function of the length of the dictionary of features. We study the critical capacity of the generalized Hebb rule (red curve) and the DCM rule (blue curve) when the patterns are generated as combinations of features, chosen from a dictionary of varying length L. In the inset, the mean Pearson correlation in a dataset of 200 patterns is shown as a function of the dictionary length. In the numerical experiments, every feature had a fixed sparsity of f = 0.1 and each pattern was obtained as a superposition of F = 6 features (see the electronic supplementary material, section IV D). The curves were obtained by averaging over 10 samples (error bars are smaller than the point size). (Online version in colour.)
Figure 7.
Figure 7.
Number of spurious attractors for a network of N = 400 neurons. This figure shows the number of distinct spurious attractors found during 10000 independent random walks, of 200 time-steps, after a small number of patterns were learned by the network (see the electronic supplementary material, section IV B). The red curve represents the Hebb rule (the first peak is due to finite size effects). The blue curve shows the behaviour of the DCM rule. The curves were obtained by averaging over 10 samples (error bars are standard errors). (Online version in colour.)
Figure 8.
Figure 8.
Scaling properties of the palimpsest capacity. In this figure, we show the results obtained when testing the DCM learning rule in the context of one-shot learning, for the case of formula image neurons. The full curves show the results for N = 100, 200, 400 and 800, illustrating the scaling properties of the palimpsest capacity. The dashed grey curves are extrapolated as the mean of the last three measurements. All the points are obtained by averaging over 10 samples. (Online version in colour.)

Similar articles

References

    1. Deco G, Rolls ET, Romo R. 2009. Stochastic dynamics as a principle of brain function. Prog. Neurobiol. 88, 1–16. (10.1016/j.pneurobio.2009.01.006) - DOI - PubMed
    1. Cannon RC, O’Donnell C, Nolan MF. 2010. Stochastic ion channel gating in dendritic neurons: morphology dependence and probabilistic synaptic activation of dendritic spikes. PLoS Comput. Biol. 6, e1000886 (10.1371/journal.pcbi.1000886) - DOI - PMC - PubMed
    1. Flight MH. 2008. Synaptic transmission: on the probability of release. Nat. Rev. Neurosci. 9, 736–737.
    1. Azouz R, Gray CM. 1999. Cellular mechanisms contributing to response variability of cortical neurons in vivo. J. Neurosci. 19, 2209–2223. (10.1523/JNEUROSCI.19-06-02209.1999) - DOI - PMC - PubMed
    1. Gerstner W, Kistler WM. 2002. Spiking neuron models: single neurons, populations, plasticity. Cambridge, UK: Cambridge University Press.

LinkOut - more resources