Comparative Study

. 2015 Apr 1;11(4):e1004141.

doi: 10.1371/journal.pcbi.1004141. eCollection 2015 Apr.

The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction

Ross S Williamson¹, Maneesh Sahani², Jonathan W Pillow³

Affiliations

¹ Gatsby Computational Neuroscience Unit, University College London, London, UK; Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK.
² Gatsby Computational Neuroscience Unit, University College London, London, UK.
³ Princeton Neuroscience Institute, Department of Psychology, Princeton University, Princeton, New Jersey, USA.

PMID: 25831448
PMCID: PMC4382343
DOI: 10.1371/journal.pcbi.1004141

Comparative Study

The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction

Ross S Williamson et al. PLoS Comput Biol. 2015.

. 2015 Apr 1;11(4):e1004141.

doi: 10.1371/journal.pcbi.1004141. eCollection 2015 Apr.

Authors

Ross S Williamson¹, Maneesh Sahani², Jonathan W Pillow³

Affiliations

¹ Gatsby Computational Neuroscience Unit, University College London, London, UK; Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK.
² Gatsby Computational Neuroscience Unit, University College London, London, UK.
³ Princeton Neuroscience Institute, Department of Psychology, Princeton University, Princeton, New Jersey, USA.

PMID: 25831448
PMCID: PMC4382343
DOI: 10.1371/journal.pcbi.1004141

Erratum in

Correction: The Equivalence of Information-Theoretic and Likelihood-Based Methods for Neural Dimensionality Reduction.
PLOS Computational Biology Staff. PLOS Computational Biology Staff. PLoS Comput Biol. 2019 Jun 14;15(6):e1007139. doi: 10.1371/journal.pcbi.1007139. eCollection 2019 Jun. PLoS Comput Biol. 2019. PMID: 31199805 Free PMC article.

Abstract

Stimulus dimensionality-reduction methods in neuroscience seek to identify a low-dimensional space of stimulus features that affect a neuron's probability of spiking. One popular method, known as maximally informative dimensions (MID), uses an information-theoretic quantity known as "single-spike information" to identify this space. Here we examine MID from a model-based perspective. We show that MID is a maximum-likelihood estimator for the parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical single-spike information corresponds to the normalized log-likelihood under a Poisson model. This equivalence implies that MID does not necessarily find maximally informative stimulus dimensions when spiking is not well described as Poisson. We provide several examples to illustrate this shortcoming, and derive a lower bound on the information lost when spiking is Bernoulli in discrete time bins. To overcome this limitation, we introduce model-based dimensionality reduction methods for neurons with non-Poisson firing statistics, and show that they can be framed equivalently in likelihood-based or information-theoretic terms. Finally, we show how to overcome practical limitations on the number of stimulus dimensions that MID can estimate by constraining the form of the non-parametric nonlinearity in an LNP model. We illustrate these methods with simulations and data from primate visual cortex.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. The linear-nonlinear-Poisson (LNP) encoding model formalizes the neural encoding process in terms of a cascade of three stages.**
First, the high-dimensional stimulus s projects onto bank of filters contained in the columns of a matrix K, resulting in a point in a low-dimensional neural feature space K ^⊤ s. Second, an instantaneous nonlinear function f maps the filtered stimulus to an instantaneous spike rate λ. Third, spikes r are generated according to an inhomogeneous Poisson process.

**Fig 2. Geometric illustration of maximally-informative-dimensions (MID).**
**Left:** A two-dimensional stimulus space, with points indicating the location of raw stimuli (black) and spike-eliciting stimuli (red). For this simulated example, the probability of spiking depended only on the projection onto a filter k _true, oriented at 45^∘. Histograms (inset) show the one-dimensional distributions of raw (black) and spike-triggered stimuli (red) projected onto k _true (lower right) and its orthogonal complement (lower left). **Right:** Estimated single-spike information captured by a 1D subspace, as a function of the axis of projection. The MID estimate ${\hat{k}}_{M I D}$ (dotted) corresponds to the axis maximizing single-spike information, which converges asymptotically to k _true with dataset size.

**Fig 3. Effects of the number of histogram bins on empirical single-spike information and MID performance.**
**(A)** Scatter plot of raw stimuli (black) and spike-triggered stimuli (gray) from a simulated experiment using two-dimensional stimuli to drive a linear-nonlinear-Bernoulli neuron with sigmoidal nonlinearity. Arrow indicates the direction of the true filter k. **(B)** Plug-In estimates of p(k ^⊤ s|*spike*), the spike-triggered stimulus distribution along the true filter axis, from 1000 stimuli and 200 spikes, using 5 (blue), 20 (green) or 80 (red) histogram bins. Black traces show estimates of raw distribution p(k ^⊤ s) along the same axis. **(C)** True nonlinearity (black) and ML estimates of the nonlinearity (derived from the ratio of the density estimates shown in B). Roughness of the 80-bin estimate (red) arises from undersampling, or (equivalently) overfitting of the nonlinearity. **(D)** Empirical single-spike information vs. direction, calculated using 5, 20 or 80 histogram bins. Note that the 80-bin model overestimates the true asymptotic single-spike information at the peak by a factor of more than 1.5. **(E)** Convergence of empirical single-spike information along the true filter axis as a function of sample size. With small amounts of data, all three models overfit, leading to upward bias in estimated information. For large amounts of data, the 5-bin model underfits and therefore under-estimates information, since it lacks the smoothness to adequately describe the shape of the sigmoidal nonlinearity. **(F)** Filter error as a function of the number of stimuli, showing that the optimal number of histogram bins depends on the amount of data.

**Fig 4. Illustration of MID failure mode due to non-Poisson spiking.**
**(A)** Stimuli were drawn uniformly on the unit half-circle, θ ∼ Unif(−π/2,π/2). The simulated neuron had Bernoulli (i.e., binary) spiking, where the probability of a spike increased linearly from 0 to 1 as θ varied from -π/2 to π/2, that is: p(*spike*|θ) = θ/π+1/2. Stimuli eliciting “spike” and “no-spike” are indicated by gray and black circles, respectively. For this neuron, the most informative one-dimensional linear projection corresponds to the vertical axis ( ${\hat{k}}_{B e r}$ ), but the MID estimator ( ${\hat{k}}_{M I D}$ ) exhibits a 16^∘ clockwise bias. **(B)** Information from spikes (black), silences (gray), and both (red), as a function of projection angle. The peak of the Bernoulli information (which defines ${\hat{k}}_{B e r}$ ) lies close to π/2, while the peak of single-spike information (which defines ${\hat{k}}_{M I D}$ ) exhibits the clockwise bias shown in A. Note that ${\hat{k}}_{M I D}$ does not converge to the optimal direction even in the limit of infinite data, due to its lack of sensitivity to information from silences. Although this figure is framed in an information-theoretic sense, equations (19) and (20) detail the equivalence between I _Ber and ℒ_lnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.

**Fig 5. A second example Bernoulli neuron for which k^MID fails to identify the most-informative one-dimensional subspace.**
The stimulus space has two dimensions, denoted s ₁ and s ₂, and stimuli were drawn *iid* from a standard Gaussian (0,1). **(A)** The nonlinearity f(s ₁,s ₂) = p(*spike*|s ₁,s ₂) is excitatory in s ₁ and suppressive in s ₂; brighter intensity indicates higher spike probability. **(B)** Contour plot of the stimulus-conditional densities given the two possible responses: “spike” (red) or “no-spike” (blue), along with the raw stimulus distribution (black). **(C)** Information carried by silences (I ₀), single spikes (I _ss), and total Bernoulli information (I _Ber = I ₀+I _ss) as a function of subspace orientation. The MID estimate ${\hat{k}}_{M I D} = 90^{\circ}$ is the maximum of I _ss, but the total Bernoulli information is in fact 13% higher at ${\hat{k}}_{B e r} = 0^{\circ}$ due to the incorporation of no-spike information. Although both stimulus axes are clearly relevant to the neuron, MID identifies the less informative one. As with the previous figure, equations (19) and (20) detail the equivalence between I _Ber and ℒ_lnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.

formula image — **Fig 5. A second example Bernoulli neuron for which k^MID fails to identify the most-informative one-dimensional subspace.**
The stimulus space has two dimensions, denoted s ₁ and s ₂, and stimuli were drawn *iid* from a standard Gaussian (0,1). **(A)** The nonlinearity f(s ₁,s ₂) = p(*spike*|s ₁,s ₂) is excitatory in s ₁ and suppressive in s ₂; brighter intensity indicates higher spike probability. **(B)** Contour plot of the stimulus-conditional densities given the two possible responses: “spike” (red) or “no-spike” (blue), along with the raw stimulus distribution (black). **(C)** Information carried by silences (I ₀), single spikes (I _ss), and total Bernoulli information (I _Ber = I ₀+I _ss) as a function of subspace orientation. The MID estimate ${\hat{k}}_{M I D} = 90^{\circ}$ is the maximum of I _ss, but the total Bernoulli information is in fact 13% higher at ${\hat{k}}_{B e r} = 0^{\circ}$ due to the incorporation of no-spike information. Although both stimulus axes are clearly relevant to the neuron, MID identifies the less informative one. As with the previous figure, equations (19) and (20) detail the equivalence between I _Ber and ℒ_lnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.

Fig 6. Lower bound on the fraction of total information neglected by MID for a Bernoulli neuron, as a function of the marginal spike probability p(*spike*) = p(r = 1), for the special case of a binary stimulus.
Information loss is quantified as the ratio I ₀/(I ₀+I _ss), the information due to no-spike events, I ₀, divided by the total information due to spikes and silences, I ₀+I _ss. The dashed gray line shows the lower bound derived in the limit p(*spike*) → 0. The solid black line shows the actual minimum achieved for binary stimuli s ∈ {0,1} with p(s = 1) = q, computed via a numerical search over the parameter q ∈ [0, 1] for each value of p(*spike*). The lower bound is substantially loose for p(*spike*) > 0, since as p(*spike*) → 1, the fraction of information due to silences goes to 1.

**Fig 7. Two examples illustrating sub-optimality of MID under discrete (non-Poisson) spiking.**
In both cases, stimuli were uniformly distributed within the unit circle and the simulated neuron’s response depended on a 1D projection of the stimulus onto the horizontal axis (θ = 0). Each stimulus evoked 0, 1, or 2 spikes. **(A)** Deterministic neuron. *Left*: Scatter plot of stimuli labelled by number of spikes evoked, and the piece-wise constant nonlinearity governing the response (below). The nonlinearity sets the response count deterministically, thus dramatically violating Poisson expectations. *Middle*: information vs. axis of projection. The total information I _count reflects the information from 0-, 1-, and 2-spike responses (treated as distinct symbols), while the single-spike information I _ss ignores silences and treats 2-spike responses as two samples from p(s|*spike*). *Right*: Average absolute error in ${\hat{k}}_{M I D}$ and ${\hat{k}}_{c o u n t}$ as a function of sample size; the latter achieves 18% lower error due to its sensitivity to the non-Poisson structure of the response. **(B)** Stochastic neuron with sigmoidal nonlinearity controlling the stochasticity of responses. The neuron transitions from almost always emitting 1 spike for large negative stimulus projections, to generating either 0 or 2 spikes with equal probability at large positive projections. Here, the nonlinearity does not modulate the mean spike rate, so Î _ss is approximately zero for all stimulus projections (middle) and the MID estimator does not converge (right). However, the ${\hat{k}}_{c o u n t}$ estimate converges because the LNC model is sensitive to the change in conditional response distribution. Equation (37) details the relationship between I _count and ℒ_lnc, so that this figure can be interpreted from either an information-theoretic or likelihood-based perspective.

**Fig 8. Estimation of high-dimensional subspaces using a nonlinearity parametrized with cylindrical basis functions (CBFs).**
**(A)** Eight most informative filters for an example complex cell, estimated with iSTAC (*top row*) and cbf-LNP (*bottom row*). For the cbf-LNP model, the nonlinearity was parametrized with three first-order CBFs for the output of each filter (see Methods). **(B)** Estimated 1D nonlinearity along each filter axis, for the filters shown in (A). Note that third and fourth iSTAC filters are suppressive while third and fourth cbf-LNP filter are excitatory. **(C)** Cross-validated single-spike information for iSTAC, cbf-LNP, and rbf-LNP, as a function of the number of filters, averaged over a population of 16 neurons (selected from [29] for having ≥ 8 informative filters). The cbf-LNP estimate outperformed iSTAC in all cases, while rbf-LNP yielded a slight further increase for the first four dimensions. **(D)** Computation time for the numerical optimization of the cbf-LNP likelihood for up to 8 filters. Even for 30 minutes of data and 8 filters, optimisation took about 4 hours. **(E)** Average number of excitatory filters as a function of total number of filters, for each method. **(F)** Information gain from excitatory filters, for each method, averaged across neurons. Each point represents the average amount of information gained from adding an excitatory filter, as a function of the number of filters.

See this image and copyright information in PMC

Cited by

Modeling statistical dependencies in multi-region spike train data.
Keeley SL, Zoltowski DM, Aoi MC, Pillow JW. Keeley SL, et al. Curr Opin Neurobiol. 2020 Dec;65:194-202. doi: 10.1016/j.conb.2020.11.005. Epub 2020 Dec 14. Curr Opin Neurobiol. 2020. PMID: 33334641 Free PMC article. Review.
Characterizing and dissociating multiple time-varying modulatory computations influencing neuronal activity.
Niknam K, Akbarian A, Clark K, Zamani Y, Noudoost B, Nategh N. Niknam K, et al. PLoS Comput Biol. 2019 Sep 12;15(9):e1007275. doi: 10.1371/journal.pcbi.1007275. eCollection 2019 Sep. PLoS Comput Biol. 2019. PMID: 31513570 Free PMC article.
Inferring synaptic inputs from spikes with a conductance-based neural encoding model.
Latimer KW, Rieke F, Pillow JW. Latimer KW, et al. Elife. 2019 Dec 18;8:e47012. doi: 10.7554/eLife.47012. Elife. 2019. PMID: 31850846 Free PMC article.
Inferring single-trial neural population dynamics using sequential auto-encoders.
Pandarinath C, O'Shea DJ, Collins J, Jozefowicz R, Stavisky SD, Kao JC, Trautmann EM, Kaufman MT, Ryu SI, Hochberg LR, Henderson JM, Shenoy KV, Abbott LF, Sussillo D. Pandarinath C, et al. Nat Methods. 2018 Oct;15(10):805-815. doi: 10.1038/s41592-018-0109-9. Epub 2018 Sep 17. Nat Methods. 2018. PMID: 30224673 Free PMC article.
A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.
Pennington JR, David SV. Pennington JR, et al. PLoS Comput Biol. 2023 May 5;19(5):e1011110. doi: 10.1371/journal.pcbi.1011110. eCollection 2023 May. PLoS Comput Biol. 2023. PMID: 37146065 Free PMC article.

See all "Cited by" articles

References

1. de Ruyter van Steveninck RR, Bialek W (1988) Real-time performance of a movement-senstivive neuron in the blowfly visual system: coding and information transmission in short spike sequences. Proceedings of the Royal Society B 234: 379–414. 10.1098/rspb.1988.0055 - DOI
1. Aguera y Arcas B, Fairhall AL (2003) What causes a neuron to spike? Neural Computation 15: 1789–1807. 10.1162/08997660360675044 - DOI - PubMed
1. Aguera y Arcas B, Fairhall AL, Bialek W (2003) Computation in a single neuron: Hodgkin and Huxley revisited. Neural Computation 15: 1715–1749. 10.1162/08997660360675017 - DOI - PubMed
1. Simoncelli EP, Pillow JW, Paninski L, Schwartz O (2004) Characterization of neural responses with stochastic stimuli In: Gazzaniga M, editor, The Cognitive Neurosciences, III, Cambridge, MA: MIT Press, chapter 23 pp. 327–338.
1. Bialek W, de Ruyter van Steveninck RR (2005). Features and dimensions: Motion estimation in fly vision. arXiv:q-bio.NC/0505003.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction

Affiliations

The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources