This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 May 26:2024.05.24.595822.

doi: 10.1101/2024.05.24.595822.

A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex

Kyle M Rupp¹, Jasmine L Hect¹, Emily E Harford¹, Lori L Holt², Avniel Singh Ghuman¹, Taylor J Abel^{1

3}

Affiliations

¹ Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.
² Department of Psychology, The University of Texas at Austin, Austin, Texas, United States of America.
³ Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.

PMID: 38826304
PMCID: PMC11142240
DOI: 10.1101/2024.05.24.595822

A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex

Kyle M Rupp et al. bioRxiv. 2024.

[Preprint]. 2024 May 26:2024.05.24.595822.

doi: 10.1101/2024.05.24.595822.

Authors

Kyle M Rupp¹, Jasmine L Hect¹, Emily E Harford¹, Lori L Holt², Avniel Singh Ghuman¹, Taylor J Abel^{1

3}

Affiliations

¹ Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.
² Department of Psychology, The University of Texas at Austin, Austin, Texas, United States of America.
³ Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.

PMID: 38826304
PMCID: PMC11142240
DOI: 10.1101/2024.05.24.595822

Update in

A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex.
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. Rupp KM, et al. Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2412243122. doi: 10.1073/pnas.2412243122. Epub 2025 Apr 28. Proc Natl Acad Sci U S A. 2025. PMID: 40294254

Abstract

Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within auditory cortex, there are critical unanswered questions regarding the organization and dynamics of sound categorization. Here, we performed intracerebral recordings in the context of epilepsy surgery as 20 patient-participants listened to natural sounds. We built encoding models to predict neural responses using features of these sounds extracted from different layers within a sound-categorization deep neural network (DNN). This approach yielded highly accurate models of neural responses throughout auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers of the DNN associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity also existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt, and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. When we estimated the time window over which each recording site integrates information, we found shorter integration windows in core relative to lateral belt and parabelt. Lastly, we found a relationship between the length of the integration window and the complexity of information processing within core (but not lateral belt or parabelt). These findings suggest hierarchies of timescales and processing complexity, and their interrelationship, represent a functional organizational principle of the auditory stream that underlies our perception of complex, abstract auditory information.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: The authors have declared that no competing interests exist.

Figures

**Fig. 1.. Methods.**
(A) Patients performed an auditory 1-back task using Natural Sounds stimuli. The dashed black box in the auditory spectrogram represents the 975 ms input window for the DNN (see panel C). (B) Broadband high gamma activity (HGA) from an example channel. (C) YAMNet deep neural network model architecture. Arrow colors represent different blocks of DNN layer operations. Depthwise separable convolutions were also used between the grouped layers in the figure (L3-4, L5-6, L7-12, and L13-14). Using this pre-trained DNN, layer activations for each stimulus were extracted and used to build encoding models to predict HGA.

**Fig. 2.. Long-window encoding model results.**
(A) Example channels in core (blue), lateral belt (orange), and parabelt (red) used for panels B-D. (B) Mean HGA (±1 SEM). Gray box shows analysis window for long-window models (see Methods). (C) Predicted vs. observed responses for the best-performing encoding models. (D) Model accuracies across DNN layers. Points show best (peak) model plotted in (C), and dashed lines show the weighted DNN layer, which is the weighted mean of each curve. (E) Encoding model results across patients and channels. Neural prediction accuracy for the best model is shown by marker size. Color represents the weighted DNN layer (the dashed lines from panel D).

**Fig. 3.. Complexity gradients within ROIs.**
(A) A gradient of increasing representational complexity (indexed by weighted DNN layer) was found along a posteromedial-anterolateral axis in both core and lateral belt, but only in the right hemisphere. This axis was defined using the best fit line through core voxels. For lateral belt, we used an axis with the same direction but shifted to the lateral belt centroid. (B) Representational complexity gradients were also found in parabelt along the posterior-anterior and ventral-dorsal axes, with complexity increasing in the anterior and ventral directions. Again, this relationship was only observed in the right hemisphere. Results are across all patients and channels with long-window R² > 0.1. *** p < .001, ** p < .01, * p < .05, Bonferroni corrected.

**Fig. 4.. Integration windows.**
(A) Method for estimating integration windows. Using the best DNN layer’s model for each channel, spectrograms were increasingly truncated and input to the DNN. Predicted HGA was calculated for each truncation window, and correlation was calculated between predicted and observed HGA. The elbow of the correlation curve represents the shortest stimulus window that accurately predicts HGA without appreciable information loss. The right panel shows a core (blue) and lateral belt (orange) channel with integration windows of 115 ms and 425 ms respectively. (B) Integration windows across all patients and channels (with short-window R² > 0.1) are shown, with shorter windows observed in core and longer windows in lateral belt and parabelt regions.

**Fig. 5.. Complexity vs. integration window.**
The core region showed a strong positive correlation between representational complexity (weighted DNN layer) and integration window length, while lateral belt and parabelt showed no such relationship. Marginal distributions show differences in representational complexity between all three regions (top); integration windows (marginal distribution, right) only differed between core and lateral belt as well as core and parabelt. No differences were observed between lateral belt and parabelt. *** p < 10⁻³, Bonferroni corrected

See this image and copyright information in PMC

References

1. Agus T. R., Paquette S., Suied C., Pressnitzer D., Belin P., Voice selectivity in the temporal voice area despite matched low-level acoustic cues. Sci. Rep. 7, 11526 (2017). - PMC - PubMed
1. Norman-Haignere S. V., McDermott J. H., Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLOS Biol. 16, e2005127 (2018). - PMC - PubMed
1. Giordano B. L., McAdams S., Zatorre R. J., Kriegeskorte N., Belin P., Abstract Encoding of Auditory Objects in Cortical Activity Patterns. Cereb. Cortex 23, 2025–2037 (2013). - PubMed
1. Staib M., Frühholz S., Cortical voice processing is grounded in elementary sound analyses for vocalization relevant sound patterns. Prog. Neurobiol. 200, 101982 (2021). - PubMed
1. Santoro R., et al., Encoding of Natural Sounds at Multiple Spectral and Temporal Resolutions in the Human Auditory Cortex. PLOS Comput. Biol. 10, e1003412 (2014). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex

Affiliations

A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

This is a preprint.

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Related information

Grants and funding

LinkOut - more resources

Full Text Sources