Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 11;32(7):1470-1484.e12.
doi: 10.1016/j.cub.2022.01.069. Epub 2022 Feb 22.

A neural population selective for song in human auditory cortex

Affiliations

A neural population selective for song in human auditory cortex

Sam V Norman-Haignere et al. Curr Biol. .

Erratum in

  • A neural population selective for song in human auditory cortex.
    Norman-Haignere SV, Feather J, Boebinger D, Brunner P, Ritaccio A, McDermott JH, Schalk G, Kanwisher N. Norman-Haignere SV, et al. Curr Biol. 2022 Mar 28;32(6):1454-1455. doi: 10.1016/j.cub.2022.03.016. Curr Biol. 2022. PMID: 35349804 Free PMC article. No abstract available.

Abstract

How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.

Keywords: ECoG; auditory cortex; component; electrocorticography; fMRI; music; natural sounds; song; speech; voice.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests N.K. was recently on the Current Biology advisory board. The other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Overview of experiment and decomposition method.
A, The sound set consisted of 165 commonly heard sounds (each 2-seconds). B, Electrodes were selected based on the split-half reliability of their broadband gamma response timecourse (70–140 Hz) to natural sounds (correlation between odd vs. even repetitions). This panel plots reliability maps for six example subjects (of 15 total), illustrating the sparse and variable coverage. Subjects were numbered based on the number of reliable electrodes in their dataset. Blue circles outline the example electrodes shown in panel C. C, The broadband gamma response timecourse of several example electrodes to all 165 sounds, plotted as a raster. The time-averaged response to each sound is plotted to the right of the raster. The sounds have been grouped and colored based on membership in one of 12 sound categories. Below each raster, we plot the average response timecourse to each category with greater than 5 exemplars. Error bars plot the median and central 68% of the sampling distribution (equivalent to one standard error for a Gaussian), computed via bootstrapping across sounds. D, Electrode timecourses were compiled in a matrix, where each row contains the full response timecourse of each electrode (from 0 to 3 seconds post-stimulus onset), concatenated across all 165 sounds tested. The data matrix was approximated as the product of a response timecourse matrix, which contains a small number of canonical response timecourses that are shared across all electrodes, with an electrode weight matrix that expresses the contribution of each component timecourse to each electrode (see Figures S1 & S7 for additional modeling and methods details). E, Cross-validation was used to compare models (Figure S1G) and determine the number of components. The data matrix was divided into cells, with one cell containing the response timecourse of a single electrode to a single sound. The model was trained on a randomly chosen subset of 80% of cells and was then tested on the remaining 20% of cells. This panel plots the squared test correlation between measured and predicted responses for different numbers of components (averaged across all electrodes). The correlation has been noise-corrected using the test-retest reliability of the electrode responses so as to provide an estimate of explainable variance. Error bars plot the median and central 68% of the sampling distribution, computed via bootstrapping across subjects.
Figure 2.
Figure 2.. Category-selective components.
A, The response timecourse of four components that responded selectively to speech or music sounds. The format is the same as the example electrodes shown in Figure 1C. Figure S2A plots component responses from a simpler NMF model. Figure S3 shows additional acoustic analyses of the speech-selective components. B, The anatomical distribution of weights for each component. We used fMRI responses to the same sounds from 30 non-overlapping subjects to get a second estimate of each component’s anatomical distribution (top panel). The fMRI weights were computed by regressing the time-averaged response of the ECoG-derived components against the response of each fMRI voxel. The bottom panel overlays the electrode weights computed directly from the ECoG data (each circle corresponding to one electrode). The orange outlines show the approximate location of primary auditory cortex, defined tonotopically in our prior fMRI study. The weight scale is arbitrary. The upper limit of the color scale was set to 99% (fMRI) or 95% (ECoG) of the weight distribution for each component (a higher threshold for fMRI because of its greater coverage). The lower limit was set to 0 (ECoG weights were constrained to be non-negative and the fMRI weights were in practice mostly positive). Figures S2B&C shows how the electrode weights are distributed across subjects. C, This figure quantifies the similarity of the fMRI and ECoG weight maps (panel B) relative to the maximum possible similarity given the across-subject reliability of each modality. The leftmost two matrices show the correlation between all pairs of component weight maps, measured using two non-overlapping sets of subjects from the same modality (left matrix: ECoG, middle: fMRI). The right matrix plots the correlation between ECoG and fMRI weight maps. The bar plots at right show the average correlation for corresponding (matrix diagonal) and non-corresponding components (off-diagonal). If the modalities are consistent, the correlation should be higher for corresponding components. The dashed line shows an estimate of the maximum possible correlation between ECoG and fMRI maps given the reliability of the two modalities. All 10 reliable components are shown, including those without strong category selectivity (see Figure 5). The components were arranged by the similarity of their response profiles since components with more similar response profiles also tended to have more similar anatomical distributions. ECoG electrode weights were resampled to standard anatomical coordinates (using a 5 mm FWHM smoothing kernel) so that they could be compared across subjects and with the fMRI maps (smoothed with a 5 mm FWHM kernel). Error bars show the central 68% of the sampling distribution, computed by bootstrapping across fMRI subjects. Bootstrapping across ECoG subjects was not feasible because of variable coverage.
Figure 3.
Figure 3.. Hypothesis-driven component analysis.
In contrast to our data-driven decomposition, here we used category labels to explicitly search for components that showed selectivity for speech, music, or song. Specifically, we attempted to learn a weighted sum of the electrodes (via regularized regression) that came as close as possible to a binary response to speech (English or Foreign speech), music (instrumental or sung music), or sung music. Cross-validation across sounds was used to prevent over-fitting. Sung music was excluded when estimating the electrode weights for the speech-selective component, since it contains an intermediate amount of speech. Format is the same Figure 2A.
Figure 4.
Figure 4.. Component responses to natural and modulation-matched synthetic sounds.
A, Cochleagrams of example natural sounds and corresponding synthetic sounds with matched spectrotemporal modulation statistics. Cochleagrams plot energy as a function of time and frequency, similar to a spectrogram, but measured from filters designed to mimic cochlear frequency tuning (stimuli lasted 4 seconds, but to facilitate inspection, only the first 2 seconds of each cochleagram is plotted). The natural sounds tested in the modulation-matching experiment were distinct from 165 natural sounds used to identify components. B, The response of the speech, music, and song-selective components to natural and modulation-matched sounds. The sounds have been grouped into four categories: instrumental music (blue), music with singing (red), speech (green, both English and foreign), and all other sounds (black/gray). Each line shows the response timecourse (first 2-seconds) to a single natural sound (lighter colors) or modulation-matched synthetic sound (darker colors). C, The time-averaged component response to each pair of natural and modulation-matched sounds (lines connect pairs), along with the mean grand response across all natural (lighter bars) and modulation-matched (darker bars) sounds from each category.
Figure 5.
Figure 5.. Components selective for standard acoustic features.
A-B Responses and anatomical distributions for 6 components whose responses suggested selectivity for standard acoustic features (see Figure S2D for responses from other less reliable components). Same format as Figure 2A–B. C, Component responses to natural and modulation-matched synthetic sounds. Same format Figure 4C. D-E, Correlations between component responses and measures of audio frequency (panel D) and spectrotemporal modulation energy (panel E), computed from a cochleagram representation of sound. See text for details. Figure S5 shows the overall prediction accuracy of standard acoustic features and category labels in each component.
Figure 6.
Figure 6.. The response of individual electrodes selective for speech, music, or song.
We selected speech (top), music (middle), and song-selective (bottom) electrodes, and then measured their response in independent data. A, The top six electrodes that showed the most significant response preference for each category in the subset of data used for electrode selection. For speech-selective electrodes, the top 6 electrodes came from 2 subjects (2 from S1 and 4 from S2), and so we instead plot the top electrode from 6 different subjects to show the consistency/diversity across subjects. Same format as Figure 2A. B, The average response (in independent data) across all electrodes identified as speech, music, or song-elective. C, The average response of speech, music, and song-selective electrodes to natural and modulation-matched synthetic sounds. Same format as Figure 4C. Figure S6 shows the effect of excluding song-selective electrodes, as well as individual subjects, on the inference of a song-selective component.
Figure 7.
Figure 7.. Prediction of speech and music selectivity detected with fMRI.
We attempted to predict the response of the speech- and music-selective fMRI components inferred in our prior study as a weighted combination of the ECoG components identified here (using ridge regression, cross-validated across sounds). The ECoG component responses were time-averaged for this analysis. This figure plots the measured and predicted response of each component. Figure S4 shows the result of attempting to predict the ECoG song-selective component from fMRI components and voxels.

Comment in

References

    1. Wallin NL, Merker B, and Brown S (2001). The origins of music (MIT press; ).
    1. Mehr SA, Singh M, Knox D, Ketter DM, Pickens-Jones D, Atwood S, Lucas C, Jacoby N, Egner AA, and Hopkins EJ (2019). Universality and diversity in human song. Science 366.
    1. Patel AD (2019). Evolutionary music cognition: Cross-species studies. In Foundations in Music Psychology: Theory and Research, pp. 459–501.
    1. Peretz I, Vuvan D, Lagrois M-É, and Armony JL (2015). Neural overlap in processing music and speech. Philosophical Transactions of the Royal Society B: Biological Sciences 370, 20140090.
    1. Leaver AM, and Rauschecker JP (2010). Cortical representation of natural complex sounds: effects of acoustic features and auditory object category. The Journal of Neuroscience 30, 7604–7612. - PMC - PubMed

Publication types

LinkOut - more resources