Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Apr 1;11(4):e1004141.
doi: 10.1371/journal.pcbi.1004141. eCollection 2015 Apr.

The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction

Affiliations
Comparative Study

The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction

Ross S Williamson et al. PLoS Comput Biol. .

Erratum in

Abstract

Stimulus dimensionality-reduction methods in neuroscience seek to identify a low-dimensional space of stimulus features that affect a neuron's probability of spiking. One popular method, known as maximally informative dimensions (MID), uses an information-theoretic quantity known as "single-spike information" to identify this space. Here we examine MID from a model-based perspective. We show that MID is a maximum-likelihood estimator for the parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical single-spike information corresponds to the normalized log-likelihood under a Poisson model. This equivalence implies that MID does not necessarily find maximally informative stimulus dimensions when spiking is not well described as Poisson. We provide several examples to illustrate this shortcoming, and derive a lower bound on the information lost when spiking is Bernoulli in discrete time bins. To overcome this limitation, we introduce model-based dimensionality reduction methods for neurons with non-Poisson firing statistics, and show that they can be framed equivalently in likelihood-based or information-theoretic terms. Finally, we show how to overcome practical limitations on the number of stimulus dimensions that MID can estimate by constraining the form of the non-parametric nonlinearity in an LNP model. We illustrate these methods with simulations and data from primate visual cortex.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The linear-nonlinear-Poisson (LNP) encoding model formalizes the neural encoding process in terms of a cascade of three stages.
First, the high-dimensional stimulus s projects onto bank of filters contained in the columns of a matrix K, resulting in a point in a low-dimensional neural feature space K s. Second, an instantaneous nonlinear function f maps the filtered stimulus to an instantaneous spike rate λ. Third, spikes r are generated according to an inhomogeneous Poisson process.
Fig 2
Fig 2. Geometric illustration of maximally-informative-dimensions (MID).
Left: A two-dimensional stimulus space, with points indicating the location of raw stimuli (black) and spike-eliciting stimuli (red). For this simulated example, the probability of spiking depended only on the projection onto a filter k true, oriented at 45. Histograms (inset) show the one-dimensional distributions of raw (black) and spike-triggered stimuli (red) projected onto k true (lower right) and its orthogonal complement (lower left). Right: Estimated single-spike information captured by a 1D subspace, as a function of the axis of projection. The MID estimate k^MID (dotted) corresponds to the axis maximizing single-spike information, which converges asymptotically to k true with dataset size.
Fig 3
Fig 3. Effects of the number of histogram bins on empirical single-spike information and MID performance.
(A) Scatter plot of raw stimuli (black) and spike-triggered stimuli (gray) from a simulated experiment using two-dimensional stimuli to drive a linear-nonlinear-Bernoulli neuron with sigmoidal nonlinearity. Arrow indicates the direction of the true filter k. (B) Plug-In estimates of p(k s|spike), the spike-triggered stimulus distribution along the true filter axis, from 1000 stimuli and 200 spikes, using 5 (blue), 20 (green) or 80 (red) histogram bins. Black traces show estimates of raw distribution p(k s) along the same axis. (C) True nonlinearity (black) and ML estimates of the nonlinearity (derived from the ratio of the density estimates shown in B). Roughness of the 80-bin estimate (red) arises from undersampling, or (equivalently) overfitting of the nonlinearity. (D) Empirical single-spike information vs. direction, calculated using 5, 20 or 80 histogram bins. Note that the 80-bin model overestimates the true asymptotic single-spike information at the peak by a factor of more than 1.5. (E) Convergence of empirical single-spike information along the true filter axis as a function of sample size. With small amounts of data, all three models overfit, leading to upward bias in estimated information. For large amounts of data, the 5-bin model underfits and therefore under-estimates information, since it lacks the smoothness to adequately describe the shape of the sigmoidal nonlinearity. (F) Filter error as a function of the number of stimuli, showing that the optimal number of histogram bins depends on the amount of data.
Fig 4
Fig 4. Illustration of MID failure mode due to non-Poisson spiking.
(A) Stimuli were drawn uniformly on the unit half-circle, θ ∼ Unif(−π/2,π/2). The simulated neuron had Bernoulli (i.e., binary) spiking, where the probability of a spike increased linearly from 0 to 1 as θ varied from -π/2 to π/2, that is: p(spike|θ) = θ/π+1/2. Stimuli eliciting “spike” and “no-spike” are indicated by gray and black circles, respectively. For this neuron, the most informative one-dimensional linear projection corresponds to the vertical axis (k^Ber), but the MID estimator (k^MID) exhibits a 16 clockwise bias. (B) Information from spikes (black), silences (gray), and both (red), as a function of projection angle. The peak of the Bernoulli information (which defines k^Ber) lies close to π/2, while the peak of single-spike information (which defines k^MID) exhibits the clockwise bias shown in A. Note that k^MID does not converge to the optimal direction even in the limit of infinite data, due to its lack of sensitivity to information from silences. Although this figure is framed in an information-theoretic sense, equations (19) and (20) detail the equivalence between I Ber and ℒlnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.
Fig 5
Fig 5. A second example Bernoulli neuron for which k^MID fails to identify the most-informative one-dimensional subspace.
The stimulus space has two dimensions, denoted s 1 and s 2, and stimuli were drawn iid from a standard Gaussian formula image(0,1). (A) The nonlinearity f(s 1,s 2) = p(spike|s 1,s 2) is excitatory in s 1 and suppressive in s 2; brighter intensity indicates higher spike probability. (B) Contour plot of the stimulus-conditional densities given the two possible responses: “spike” (red) or “no-spike” (blue), along with the raw stimulus distribution (black). (C) Information carried by silences (I 0), single spikes (I ss), and total Bernoulli information (I Ber = I 0+I ss) as a function of subspace orientation. The MID estimate k^MID=90 is the maximum of I ss, but the total Bernoulli information is in fact 13% higher at k^Ber=0 due to the incorporation of no-spike information. Although both stimulus axes are clearly relevant to the neuron, MID identifies the less informative one. As with the previous figure, equations (19) and (20) detail the equivalence between I Ber and ℒlnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.
Fig 6
Fig 6. Lower bound on the fraction of total information neglected by MID for a Bernoulli neuron, as a function of the marginal spike probability p(spike) = p(r = 1), for the special case of a binary stimulus.
Information loss is quantified as the ratio I 0/(I 0+I ss), the information due to no-spike events, I 0, divided by the total information due to spikes and silences, I 0+I ss. The dashed gray line shows the lower bound derived in the limit p(spike) → 0. The solid black line shows the actual minimum achieved for binary stimuli s ∈ {0,1} with p(s = 1) = q, computed via a numerical search over the parameter q ∈ [0, 1] for each value of p(spike). The lower bound is substantially loose for p(spike) > 0, since as p(spike) → 1, the fraction of information due to silences goes to 1.
Fig 7
Fig 7. Two examples illustrating sub-optimality of MID under discrete (non-Poisson) spiking.
In both cases, stimuli were uniformly distributed within the unit circle and the simulated neuron’s response depended on a 1D projection of the stimulus onto the horizontal axis (θ = 0). Each stimulus evoked 0, 1, or 2 spikes. (A) Deterministic neuron. Left: Scatter plot of stimuli labelled by number of spikes evoked, and the piece-wise constant nonlinearity governing the response (below). The nonlinearity sets the response count deterministically, thus dramatically violating Poisson expectations. Middle: information vs. axis of projection. The total information I count reflects the information from 0-, 1-, and 2-spike responses (treated as distinct symbols), while the single-spike information I ss ignores silences and treats 2-spike responses as two samples from p(s|spike). Right: Average absolute error in k^MID and k^count as a function of sample size; the latter achieves 18% lower error due to its sensitivity to the non-Poisson structure of the response. (B) Stochastic neuron with sigmoidal nonlinearity controlling the stochasticity of responses. The neuron transitions from almost always emitting 1 spike for large negative stimulus projections, to generating either 0 or 2 spikes with equal probability at large positive projections. Here, the nonlinearity does not modulate the mean spike rate, so Î ss is approximately zero for all stimulus projections (middle) and the MID estimator does not converge (right). However, the k^count estimate converges because the LNC model is sensitive to the change in conditional response distribution. Equation (37) details the relationship between I count and ℒlnc, so that this figure can be interpreted from either an information-theoretic or likelihood-based perspective.
Fig 8
Fig 8. Estimation of high-dimensional subspaces using a nonlinearity parametrized with cylindrical basis functions (CBFs).
(A) Eight most informative filters for an example complex cell, estimated with iSTAC (top row) and cbf-LNP (bottom row). For the cbf-LNP model, the nonlinearity was parametrized with three first-order CBFs for the output of each filter (see Methods). (B) Estimated 1D nonlinearity along each filter axis, for the filters shown in (A). Note that third and fourth iSTAC filters are suppressive while third and fourth cbf-LNP filter are excitatory. (C) Cross-validated single-spike information for iSTAC, cbf-LNP, and rbf-LNP, as a function of the number of filters, averaged over a population of 16 neurons (selected from [29] for having ≥ 8 informative filters). The cbf-LNP estimate outperformed iSTAC in all cases, while rbf-LNP yielded a slight further increase for the first four dimensions. (D) Computation time for the numerical optimization of the cbf-LNP likelihood for up to 8 filters. Even for 30 minutes of data and 8 filters, optimisation took about 4 hours. (E) Average number of excitatory filters as a function of total number of filters, for each method. (F) Information gain from excitatory filters, for each method, averaged across neurons. Each point represents the average amount of information gained from adding an excitatory filter, as a function of the number of filters.

Similar articles

Cited by

References

    1. de Ruyter van Steveninck RR, Bialek W (1988) Real-time performance of a movement-senstivive neuron in the blowfly visual system: coding and information transmission in short spike sequences. Proceedings of the Royal Society B 234: 379–414. 10.1098/rspb.1988.0055 - DOI
    1. Aguera y Arcas B, Fairhall AL (2003) What causes a neuron to spike? Neural Computation 15: 1789–1807. 10.1162/08997660360675044 - DOI - PubMed
    1. Aguera y Arcas B, Fairhall AL, Bialek W (2003) Computation in a single neuron: Hodgkin and Huxley revisited. Neural Computation 15: 1715–1749. 10.1162/08997660360675017 - DOI - PubMed
    1. Simoncelli EP, Pillow JW, Paninski L, Schwartz O (2004) Characterization of neural responses with stochastic stimuli In: Gazzaniga M, editor, The Cognitive Neurosciences, III, Cambridge, MA: MIT Press, chapter 23 pp. 327–338.
    1. Bialek W, de Ruyter van Steveninck RR (2005). Features and dimensions: Motion estimation in fly vision. arXiv:q-bio.NC/0505003.

Publication types

LinkOut - more resources