Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 18;30(33):11210-21.
doi: 10.1523/JNEUROSCI.2239-10.2010.

A temporal hierarchy for conspecific vocalization discrimination in humans

Affiliations

A temporal hierarchy for conspecific vocalization discrimination in humans

Marzia De Lucia et al. J Neurosci. .

Abstract

The ability to discriminate conspecific vocalizations is observed across species and early during development. However, its neurophysiologic mechanism remains controversial, particularly regarding whether it involves specialized processes with dedicated neural machinery. We identified spatiotemporal brain mechanisms for conspecific vocalization discrimination in humans by applying electrical neuroimaging analyses to auditory evoked potentials (AEPs) in response to acoustically and psychophysically controlled nonverbal human and animal vocalizations as well as sounds of man-made objects. AEP strength modulations in the absence of topographic modulations are suggestive of statistically indistinguishable brain networks. First, responses were significantly stronger, but topographically indistinguishable to human versus animal vocalizations starting at 169-219 ms after stimulus onset and within regions of the right superior temporal sulcus and superior temporal gyrus. This effect correlated with another AEP strength modulation occurring at 291-357 ms that was localized within the left inferior prefrontal and precentral gyri. Temporally segregated and spatially distributed stages of vocalization discrimination are thus functionally coupled and demonstrate how conventional views of functional specialization must incorporate network dynamics. Second, vocalization discrimination is not subject to facilitated processing in time, but instead lags more general categorization by approximately 100 ms, indicative of hierarchical processing during object discrimination. Third, although differences between human and animal vocalizations persisted when analyses were performed at a single-object level or extended to include additional (man-made) sound categories, at no latency were responses to human vocalizations stronger than those to all other categories. Vocalization discrimination transpires at times synchronous with that of face discrimination but is not functionally specialized.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Statistical comparison of stimuli. Left, The spectrogram of each stimulus was generated and comparisons (nonparametric t tests) were performed across groups of sounds for each ∼5 ms and ∼80 Hz time–frequency bin. Right, Bins meeting the following statistical criteria are displayed as red: eight spatially contiguous bins (equivalent to a cluster-level value of p < 0.00625).
Figure 2.
Figure 2.
Power spectra of each vocalization. Each line is the power spectrum for a single exemplar of a given vocalization. The two leftmost columns display the spectra for human vocalizations, and the three rightmost columns, the spectra for animal vocalizations. The x-axis is frequency in kilohertz, and the y-axis is in arbitrary units. The dots indicate the lowest frequency peak in each power spectrum for each of the sounds (i.e., f0). These f0 values were not significantly different between the two groups of vocalizations either when considered separately (t(52.6) = 0.71; p = 0.48) or when first averaged across the exemplars of a given object (t(16.3) = 0.41; p = 0.69).
Figure 3.
Figure 3.
a, Exemplar waveforms from a frontocentral midline electrode (FCz). These group-averaged waveforms exhibit prototypical AEP peaks. Response modulations are visually apparent from 160 ms after stimulus onset. b, The results of millisecond-by-millisecond paired t tests at each of the scalp electrodes from the group-averaged AEP waveforms are shown (only p < 0.05 with a 25 ms temporal criterion are shown).
Figure 4.
Figure 4.
a, Modulations in response strength were identified using GFP. Group-averaged GFP waveforms are displayed along with the results of millisecond-by-millisecond paired t tests. b, Topographic modulations between conditions were assessed using global dissimilarity. The results of the TANOVA procedure are illustrated as a function of time (in both panels 1 minus p value is shown after applying a p < 0.05 and 25 ms temporal criterion, as in Fig. 3).
Figure 5.
Figure 5.
a, b, Group-averaged distributed linear source estimations were calculated over the 169–219 ms poststimulus period for each experimental condition. Results are rendered on the average MNI brain. Axial slice shows the activations for each of the two conditions in correspondence to the maximal t value at 47, −22, 6 mm. c, Mean difference in source estimations included a distributed set of regions. The scaling for this difference is one-half that of the maximum for the estimations in response to animal vocalizations. d, Results of the statistical contrast of the source estimations between AEPs to human and animal vocalizations are displayed in the same manner as in a and b.
Figure 6.
Figure 6.
a, b, Group-averaged distributed linear source estimations were calculated over the 291–357 ms poststimulus period for each experimental condition (scale indicated). Results are rendered on the average MNI brain. Axial slice shows the activation for each of the two conditions in correspondence to the maximal t value at −53, −3, 40 mm. c, Results of the statistical contrast of the source estimations between AEPs to human and animal vocalization are displayed in the same manner as in a and b.
Figure 7.
Figure 7.
Linear correlation across the activation difference between responses to human and animal vocalizations within the two clusters shown in Figures 5d and 6c, x-axis and y-axis, respectively.
Figure 8.
Figure 8.
a, Group-averaged GFP waveforms in response to vocalizations as well as two classes of sounds of man-made objects. b–d, GFP area measures (SEM indicated) over selected poststimulus intervals.
Figure 9.
Figure 9.
Results of single-object AEP analyses. a, Group-averaged GFP waveforms are displayed for each vocalization (left panel, human vocalizations; right panel, animal vocalizations) along with the mean across vocalizations (thicker lines). b, GFP area taken over the 169–219 ms poststimulus interval for each single-object AEP (light gray bars) as well as the average for each category of vocalizations (black bar, human vocalizations; dark gray bar, animal vocalizations; SEM indicated). The inset displays the main effect of object category after conducting a univariate ANCOVA (see Results). c, Scatterplot comparing GFP area over the 169–219 ms period and the corresponding f0 value for each object (black dots refer to human vocalizations and white dots to animal vocalizations). There was no evidence for a systematic relationship between these measures (R2 = 0.0006).
Figure 10.
Figure 10.
Schematic representation of a temporal hierarchy in auditory object discrimination summarizing the results of this study. Categorical effects on GFP are shown as a function of time relative to when they are first observed (subsequent effects not shown for simplicity). In a hierarchical fashion over time, general sound processing (initial 70 ms) is followed by living versus man-made discrimination (70–119 ms), then by human versus animal vocalization discrimination (169–219 ms), and finally by the discrimination of musical instruments versus other man-made objects (291–357 ms).

Similar articles

Cited by

References

    1. Aeschlimann M, Knebel JF, Murray MM, Clarke S. Emotional pre-eminence of human vocalizations. Brain Topogr. 2008;20:239–248. - PubMed
    1. Assal G, Aubert C, Buttet J. Asymétrie cérébrale et reconnaissance de la voix. Revue Neurolog. 1981;137:255–268. - PubMed
    1. Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B. Voice-selective areas in human auditory cortex. Nature. 2000;403:309–312. - PubMed
    1. Belin P, Zatorre RJ, Ahad P. Human temporal-lobe response to vocal sounds. Brain Res Cogn Brain Res. 2002;13:17–26. - PubMed
    1. Belin P, Fecteau S, Bédard C. Thinking the voice: neural correlates of voice perception. Trends Cogn Sci. 2004;8:129–135. - PubMed

Publication types

MeSH terms