Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 26:2024.05.24.595822.
doi: 10.1101/2024.05.24.595822.

A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex

Affiliations

A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex

Kyle M Rupp et al. bioRxiv. .

Update in

Abstract

Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within auditory cortex, there are critical unanswered questions regarding the organization and dynamics of sound categorization. Here, we performed intracerebral recordings in the context of epilepsy surgery as 20 patient-participants listened to natural sounds. We built encoding models to predict neural responses using features of these sounds extracted from different layers within a sound-categorization deep neural network (DNN). This approach yielded highly accurate models of neural responses throughout auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers of the DNN associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity also existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt, and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. When we estimated the time window over which each recording site integrates information, we found shorter integration windows in core relative to lateral belt and parabelt. Lastly, we found a relationship between the length of the integration window and the complexity of information processing within core (but not lateral belt or parabelt). These findings suggest hierarchies of timescales and processing complexity, and their interrelationship, represent a functional organizational principle of the auditory stream that underlies our perception of complex, abstract auditory information.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: The authors have declared that no competing interests exist.

Figures

Fig. 1.
Fig. 1.. Methods.
(A) Patients performed an auditory 1-back task using Natural Sounds stimuli. The dashed black box in the auditory spectrogram represents the 975 ms input window for the DNN (see panel C). (B) Broadband high gamma activity (HGA) from an example channel. (C) YAMNet deep neural network model architecture. Arrow colors represent different blocks of DNN layer operations. Depthwise separable convolutions were also used between the grouped layers in the figure (L3-4, L5-6, L7-12, and L13-14). Using this pre-trained DNN, layer activations for each stimulus were extracted and used to build encoding models to predict HGA.
Fig. 2.
Fig. 2.. Long-window encoding model results.
(A) Example channels in core (blue), lateral belt (orange), and parabelt (red) used for panels B-D. (B) Mean HGA (±1 SEM). Gray box shows analysis window for long-window models (see Methods). (C) Predicted vs. observed responses for the best-performing encoding models. (D) Model accuracies across DNN layers. Points show best (peak) model plotted in (C), and dashed lines show the weighted DNN layer, which is the weighted mean of each curve. (E) Encoding model results across patients and channels. Neural prediction accuracy for the best model is shown by marker size. Color represents the weighted DNN layer (the dashed lines from panel D).
Fig. 3.
Fig. 3.. Complexity gradients within ROIs.
(A) A gradient of increasing representational complexity (indexed by weighted DNN layer) was found along a posteromedial-anterolateral axis in both core and lateral belt, but only in the right hemisphere. This axis was defined using the best fit line through core voxels. For lateral belt, we used an axis with the same direction but shifted to the lateral belt centroid. (B) Representational complexity gradients were also found in parabelt along the posterior-anterior and ventral-dorsal axes, with complexity increasing in the anterior and ventral directions. Again, this relationship was only observed in the right hemisphere. Results are across all patients and channels with long-window R2 > 0.1. *** p < .001, ** p < .01, * p < .05, Bonferroni corrected.
Fig. 4.
Fig. 4.. Integration windows.
(A) Method for estimating integration windows. Using the best DNN layer’s model for each channel, spectrograms were increasingly truncated and input to the DNN. Predicted HGA was calculated for each truncation window, and correlation was calculated between predicted and observed HGA. The elbow of the correlation curve represents the shortest stimulus window that accurately predicts HGA without appreciable information loss. The right panel shows a core (blue) and lateral belt (orange) channel with integration windows of 115 ms and 425 ms respectively. (B) Integration windows across all patients and channels (with short-window R2 > 0.1) are shown, with shorter windows observed in core and longer windows in lateral belt and parabelt regions.
Fig. 5.
Fig. 5.. Complexity vs. integration window.
The core region showed a strong positive correlation between representational complexity (weighted DNN layer) and integration window length, while lateral belt and parabelt showed no such relationship. Marginal distributions show differences in representational complexity between all three regions (top); integration windows (marginal distribution, right) only differed between core and lateral belt as well as core and parabelt. No differences were observed between lateral belt and parabelt. *** p < 10−3, Bonferroni corrected

Similar articles

References

    1. Agus T. R., Paquette S., Suied C., Pressnitzer D., Belin P., Voice selectivity in the temporal voice area despite matched low-level acoustic cues. Sci. Rep. 7, 11526 (2017). - PMC - PubMed
    1. Norman-Haignere S. V., McDermott J. H., Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLOS Biol. 16, e2005127 (2018). - PMC - PubMed
    1. Giordano B. L., McAdams S., Zatorre R. J., Kriegeskorte N., Belin P., Abstract Encoding of Auditory Objects in Cortical Activity Patterns. Cereb. Cortex 23, 2025–2037 (2013). - PubMed
    1. Staib M., Frühholz S., Cortical voice processing is grounded in elementary sound analyses for vocalization relevant sound patterns. Prog. Neurobiol. 200, 101982 (2021). - PubMed
    1. Santoro R., et al., Encoding of Natural Sounds at Multiple Spectral and Temporal Resolutions in the Human Auditory Cortex. PLOS Comput. Biol. 10, e1003412 (2014). - PMC - PubMed

Publication types