Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Oct 24;97(22):11843-9.
doi: 10.1073/pnas.97.22.11843.

On cortical coding of vocal communication sounds in primates

Affiliations

On cortical coding of vocal communication sounds in primates

X Wang. Proc Natl Acad Sci U S A. .

Abstract

Understanding how the brain processes vocal communication sounds is one of the most challenging problems in neuroscience. Our understanding of how the cortex accomplishes this unique task should greatly facilitate our understanding of cortical mechanisms in general. Perception of species-specific communication sounds is an important aspect of the auditory behavior of many animal species and is crucial for their social interactions, reproductive success, and survival. The principles of neural representations of these behaviorally important sounds in the cerebral cortex have direct implications for the neural mechanisms underlying human speech perception. Our progress in this area has been relatively slow, compared with our understanding of other auditory functions such as echolocation and sound localization. This article discusses previous and current studies in this field, with emphasis on nonhuman primates, and proposes a conceptual platform to further our exploration of this frontier. It is argued that the prerequisite condition for understanding cortical mechanisms underlying communication sound perception and production is an appropriate animal model. Three issues are central to this work: (i) neural encoding of statistical structure of communication sounds, (ii) the role of behavioral relevance in shaping cortical representations, and (iii) sensory-motor interactions between vocal production and perception systems.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) A real-time recording of vocal exchanges between a pair of marmosets (one male, one female), shown in the form of a spectrogram. This typical vocal exchange contains trill and twitter calls. Call type and caller identity of each call are indicated. (B and C) Distribution of phrase frequency of twitter calls from a male (B, M346) and female (C, M403) marmoset. Both marmosets lived in the same colony and engaged in frequent vocal exchanges such as the example shown in A. A twitter call contains several discrete upward FM sweeps, each of which is referred to as a phrase. The intervals between phrases are relatively constant in each twitter call. Fourier analysis of a twitter call's envelope, obtained by using the Hilbert transform, revealed a local maximum reflecting the repetition frequency of the phrases. The frequency at this maximum is defined as the phrase frequency (27). (D) The results of a multidimensional clustering analysis of twitter calls of the pair of marmoset monkeys in B and C. Four parameters were calculated for each twitter call sample and used in the analysis. These parameters included (i) number of phrases, (ii) phrase frequency, (iii) spectral-peak frequency of the first phrase, and (iv) spectral-peak frequency of the second phrase. The spectral-peak frequency is computed from the magnitude spectrum of each phrase (27). Two ellipsoids are drawn by using the mean and standard deviation of a distance measure dij (see definition below). The open circle marks the mean distances of calls made by the male monkey (M346) with respect to its own group mean (abscissa) and the female monkey's group mean (ordinate); the open ellipsoid (male monkey, M346) outlines the standard deviations along both axes. The filled circle and shaded ellipsoid are calculations for calls from the female monkey M403. formula image where i, j are animal designation (1: M346, 2: M403); Ni is number of call samples from ith animal (N1 = 330, N2 = 198); Pik(n) is kth parameter in the nth call of the ith animal; and m̄ik is mean value of the kth parameter of call samples from ith animal.
Figure 2
Figure 2
Averaged temporal discharge patterns of responses to a natural twitter call and its time-reversed version recorded from the primary auditory cortex of a marmoset (27). In each of 138 sampled units, its discharge rate to the natural twitter call was compared with that to the reversed twitter call. The sampled units were divided into two subpopulation based on this analysis. Units included in the selective population (A, B) responded more strongly to the natural twitter call than to the reversed twitter call, whereas the units included in the nonselective population (C, D) responded more strongly to the reversed twitter call than to the natural twitter call. In A–D, a mean poststimulus histogram (PSTH) is shown for each neuronal population under one of the two stimulus conditions (bin width = 2.0 ms).
Figure 3
Figure 3
Population representation of the spectral shape of marmoset vocalizations. Comparison is made between short-term call spectrum of one phrase of the twitter call and rate–CF (discharge rate vs. characteristic frequency) profiles computed over a corresponding time period. Data shown were obtained from the primary auditory cortex of one marmoset (27). (A) Magnitude spectrum of the first phrase of a natural twitter call. The magnitude spectrum of this call phrase in the time-reversed call is the same; only the phase spectrum is different. (B) Rate–CF profiles were constructed based on responses to a natural twitter call from 140 sampled units and were computed by using a triangular weighting window whose base was 0.25 octave wide. The centers of adjacent windows were 0.125 octave apart. Only averages that had at least 3 units in the window were included. Three profiles are shown, all units (n = 140, black solid line), selective subpopulation (n = 102, red solid line with triangle), and nonselective subpopulation (n = 38, green dashed line with circle). The definitions of the two subpopulation of units are given in Fig. 2. (C) Rate–CF profiles are shown for cortical responses to the same call phrase as analyzed in B but delivered in the time-reversed call. The same analytic method and display format are used as in B.
Figure 4
Figure 4
Comparison between the spectrotemporal acoustic pattern of marmoset vocalizations and the corresponding spectrotemporal discharge patterns recorded in the primary auditory cortex of marmosets. In each plot (A–D), Upper shows population responses to a vocalization, and Lower shows the corresponding spectrogram of the stimulus. Discharges as they occurred in time (abscissa) from individual cortical units are aligned along the ordinate according to their objectively defined CF. The display of discharges was based on PSTHs computed for each unit (bin width = 2.0 ms). All three vocalizations were delivered at the sound level of 60 dB SPL during the experiments. (A) Population responses to a marmoset phee call. An outline of the trajectory (solid line in red) of the call's time-varying spectral peak is drawn on Upper and Lower for comparison. (B) Population responses to a marmoset multiple-phee call. An outline of the trajectory (solid line in red) of the call's time-varying spectral peak is drawn on Upper and Lower for comparison. (C) Population responses to a marmoset twitter call. The second call phrase is indicated by a vertical arrowhead (red). (D) An expanded view of cortical responses to the second phrase of the twitter call shown in C (Upper) and the corresponding spectrogram of the second call phrase (Lower), with a time mark indicated by an arrowhead (red) as in C. Responses of the same group of cortical units shown in C are included but displayed in the form of dot raster. In Upper, each recorded spike occurrence within the time period shown is marked as a dot. Spike times from 10 repetitions in each unit are aligned along 10 lines centered at the CF of the unit, shifted by 10 Hz for each repetition for display purpose (i.e., positioned from CF −50 Hz to CF +40 Hz, in 10-Hz step).

Similar articles

Cited by

References

    1. Penfield W, Roberts L. Speech and Brain-Mechanisms. Princeton, NJ: Princeton Univ. Press; 1959.
    1. Heffner H E, Heffner R S. J Neurophysiol. 1986;56:683–701. - PubMed
    1. Brodmann K. Vergleichende Lokalisationslehre der Grobhirnrinde. Leipzig: Barth; 1909.
    1. Jones E G, Powell T P S. Brain. 1970;93:793–820. - PubMed
    1. Pandya D N, Yeterian E H. In: Cerebral Cortex. Peters A, Jones E G, editors. Vol. 4. New York: Plenum; 1985. pp. 3–61.

Publication types

LinkOut - more resources