. 2013;9(3):e1002982.

doi: 10.1371/journal.pcbi.1002982. Epub 2013 Mar 28.

Sustained firing of model central auditory neurons yields a discriminative spectro-temporal representation for natural sounds

Michael A Carlin¹, Mounya Elhilali

Affiliations

Affiliation

¹ Department of Electrical and Computer Engineering, The Center for Language and Speech Processing, Johns Hopkins University, Baltimore, Maryland, United States of America.

PMID: 23555217
PMCID: PMC3610626
DOI: 10.1371/journal.pcbi.1002982

Sustained firing of model central auditory neurons yields a discriminative spectro-temporal representation for natural sounds

Michael A Carlin et al. PLoS Comput Biol. 2013.

. 2013;9(3):e1002982.

doi: 10.1371/journal.pcbi.1002982. Epub 2013 Mar 28.

Authors

Michael A Carlin¹, Mounya Elhilali

Affiliation

¹ Department of Electrical and Computer Engineering, The Center for Language and Speech Processing, Johns Hopkins University, Baltimore, Maryland, United States of America.

PMID: 23555217
PMCID: PMC3610626
DOI: 10.1371/journal.pcbi.1002982

Abstract

The processing characteristics of neurons in the central auditory system are directly shaped by and reflect the statistics of natural acoustic environments, but the principles that govern the relationship between natural sound ensembles and observed responses in neurophysiological studies remain unclear. In particular, accumulating evidence suggests the presence of a code based on sustained neural firing rates, where central auditory neurons exhibit strong, persistent responses to their preferred stimuli. Such a strategy can indicate the presence of ongoing sounds, is involved in parsing complex auditory scenes, and may play a role in matching neural dynamics to varying time scales in acoustic signals. In this paper, we describe a computational framework for exploring the influence of a code based on sustained firing rates on the shape of the spectro-temporal receptive field (STRF), a linear kernel that maps a spectro-temporal acoustic stimulus to the instantaneous firing rate of a central auditory neuron. We demonstrate the emergence of richly structured STRFs that capture the structure of natural sounds over a wide range of timescales, and show how the emergent ensembles resemble those commonly reported in physiological studies. Furthermore, we compare ensembles that optimize a sustained firing code with one that optimizes a sparse code, another widely considered coding strategy, and suggest how the resulting population responses are not mutually exclusive. Finally, we demonstrate how the emergent ensembles contour the high-energy spectro-temporal modulations of natural sounds, forming a discriminative representation that captures the full range of modulation statistics that characterize natural sound ensembles. These findings have direct implications for our understanding of how sensory systems encode the informative components of natural stimuli and potentially facilitate multi-sensory integration.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematic of the proposed framework.**
Panel (A) shows an example of an auditory spectrogram for the speech utterance “serve on frankfurter buns…” whereas panel (B) illustrates how spectro-temporal patches are mapped to an ensemble of instantaneous neural firing rates.

**Figure 2. Examples of emergent STRFs.**
Shown are STRFs learned by optimizing (A) the sustained objective function for and (B) the sparsity objective function . The examples shown here were drawn at random from ensembles of 400 neurons. The sustained STRFs are shown in order of decreasing contribution to the overall objective function whereas the sparse STRFs are shown randomly ordered. Each spectro-temporal patch spans 0–250 ms in time and 62.5–4000 Hz in frequency. For these examples the dynamic range of the STRFs was compressed using a nonlinearity.

formula image — **Figure 2. Examples of emergent STRFs.**
Shown are STRFs learned by optimizing (A) the sustained objective function for and (B) the sparsity objective function . The examples shown here were drawn at random from ensembles of 400 neurons. The sustained STRFs are shown in order of decreasing contribution to the overall objective function whereas the sparse STRFs are shown randomly ordered. Each spectro-temporal patch spans 0–250 ms in time and 62.5–4000 Hz in frequency. For these examples the dynamic range of the STRFs was compressed using a nonlinearity.

**Figure 3. Spectral clustering results.**
Shown are nine clusters obtained by pooling STRFs from the sparse as well as sustained ensembles for 10, 25, 50, 125, 250, 500, 1000, and 2500 ms. Shown in the center is a stacked bar chart where segment color corresponds to class label and segment width is proportional to the number of STRFs assigned to a particular class in a given ensemble. The surrounding panels show examples of STRFs drawn from six illustrative classes, namely, *noisy*, *localized*, *spectral*, *complex*, *temporal*, and *directional*.

**Figure 4. Analysis of the temporal activations of emergent ensembles.**
Panel (A) shows the median activation time of individual neurons (solid lines, sorted in decreasing order) for 10 and 125 ms, respectively, for STRFs that optimize the sustained objective function. The shaded region illustrates the corresponding interquartile range. Panel (B) shows the distributions (as boxplots) of median activation times of the top 10% “most persistent” neurons for sparse and sustained ensembles for increasing .

**Figure 5. Comparison of emergent STRFs learned according to the sustained objective function with examples estimated from ferret auditory cortex.**

**Figure 6. Cluster analysis of neural STRFs.**
Illustration of the overlap between the eMTFs of neural STRF clusters and that of the response-constrained sustained objective model STRFs; class 9 comprised mostly noisy STRFs with an exceedingly broad eMTF and its contour is omitted here for clarity. The white contour corresponds to the model eMTF at the 65% level.

**Figure 7. Ensemble analysis of STRFs learned under the sustained objective function for**
. In panels (A), (B), (C) and (E), the histograms show the distribution of model parameters whereas the thin green lines show the distribution of the physiological data. The black and green dashed vertical lines show population means for the model and neural data, respectively. In panels (D) and (F), the black and green lines correspond to the model and neural STRFs, respectively, with the dashed lines indicating 6-dB upper cutoff frequencies. Refer to the text for more details.

**Figure 8. Average population response histograms for STRFs learned under the sustained and sparse objectives subject to response constraints.**

**Figure 9. Examples of STRFs learned under the sustained objective function () subject to orthonormality constraints on the shapes of the filters.**
The examples shown here were drawn at random from an ensemble of 400 neurons, and the STRFs are shown in order of decreasing contribution to the overall objective function. Each spectro-temporal patch spans 0–250 ms in time and 62.5–4000 Hz in frequency. For these examples the dynamic range of the STRFs was compressed using a nonlinearity.

**Figure 10. Spectro-temporal modulations in the stimulus are fully captured by STRFs that promote sustained responses subject to response and shape constraints.**
Here, the average MTF of the stimulus is overlaid with contours (at the 65% level) of the ensemble MTFs for both constraints for . For each ensemble we also show the constellations for best rate vs. best scale (marked by ‘’ and ‘’ for response and shape constraints, respectively). For the response constraints, we show the contour line and BR/BS constellations for STRFs that contribute to 99% of the objective function.

**Figure 11. Extracting basic spectro-temporal parameters for an individual STRF.**
Panel (A) shows a typical STRF, with solid contour lines indicating those regions that exceed one standard deviation. The dashed red line shows the projected 10-dB ellipse from which we estimated spectral bandwidth. As indicated, the STRF is rather elongated with no strong directional preference, and the pattern is highly separable. Panel (B) shows the MTF computed from the magnitude of the 2D Fourier Transform of the STRF in (A); from here we estimate and . Panel (C) shows the normalized temporal and spectral modulation profiles obtained from the MTF.

See this image and copyright information in PMC

Cited by

Modeling attention-driven plasticity in auditory cortical receptive fields.
Carlin MA, Elhilali M. Carlin MA, et al. Front Comput Neurosci. 2015 Aug 19;9:106. doi: 10.3389/fncom.2015.00106. eCollection 2015. Front Comput Neurosci. 2015. PMID: 26347643 Free PMC article.
Sensory cortex is optimized for prediction of future input.
Singer Y, Teramoto Y, Willmore BD, Schnupp JW, King AJ, Harper NS. Singer Y, et al. Elife. 2018 Jun 18;7:e31557. doi: 10.7554/eLife.31557. Elife. 2018. PMID: 29911971 Free PMC article.
STRFs in primary auditory cortex emerge from masking-based statistics of natural sounds.
Sheikh AS, Harper NS, Drefs J, Singer Y, Dai Z, Turner RE, Lücke J. Sheikh AS, et al. PLoS Comput Biol. 2019 Jan 17;15(1):e1006595. doi: 10.1371/journal.pcbi.1006595. eCollection 2019 Jan. PLoS Comput Biol. 2019. PMID: 30653497 Free PMC article.
Auditory and visual scene analysis: an overview.
Kondo HM, van Loon AM, Kawahara JI, Moore BC. Kondo HM, et al. Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;372(1714):20160099. doi: 10.1098/rstb.2016.0099. Epub 2017 Jan 2. Philos Trans R Soc Lond B Biol Sci. 2017. PMID: 28044011 Free PMC article.
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.
Carlin MA, Elhilali M. Carlin MA, et al. IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23. IEEE/ACM Trans Audio Speech Lang Process. 2015. PMID: 29904642 Free PMC article.

See all "Cited by" articles

References

1. Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Annu Rev Neurosci 24: 1193–1216. - PubMed
1. Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Op Neurobio 14: 481–487. - PubMed
1. Rosenblith WA, editor (1961) Sensory Communication. Cambridge (Massachusetts): MIT Press.
1. Laughlin SB (2001) Energy as a constraint on the coding and processing of sensory information. Curr Op Neurobio 11: 475–480. - PubMed
1. Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607–609. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sustained firing of model central auditory neurons yields a discriminative spectro-temporal representation for natural sounds

Affiliation

Sustained firing of model central auditory neurons yields a discriminative spectro-temporal representation for natural sounds

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources