. 2000 Oct 24;97(22):11843-9.

doi: 10.1073/pnas.97.22.11843.

On cortical coding of vocal communication sounds in primates

X Wang¹

Affiliations

Affiliation

¹ Laboratory of Auditory Neurophysiology, Department of Biomedical Engineering, Johns Hopkins University School of Medicine, 720 Rutland Avenue, Ross 424, Baltimore, MD 21205, USA. xwang@bme.jhu.edu

PMID: 11050218
PMCID: PMC34358
DOI: 10.1073/pnas.97.22.11843

On cortical coding of vocal communication sounds in primates

X Wang. Proc Natl Acad Sci U S A. 2000.

. 2000 Oct 24;97(22):11843-9.

doi: 10.1073/pnas.97.22.11843.

Author

X Wang¹

Affiliation

¹ Laboratory of Auditory Neurophysiology, Department of Biomedical Engineering, Johns Hopkins University School of Medicine, 720 Rutland Avenue, Ross 424, Baltimore, MD 21205, USA. xwang@bme.jhu.edu

PMID: 11050218
PMCID: PMC34358
DOI: 10.1073/pnas.97.22.11843

Abstract

Understanding how the brain processes vocal communication sounds is one of the most challenging problems in neuroscience. Our understanding of how the cortex accomplishes this unique task should greatly facilitate our understanding of cortical mechanisms in general. Perception of species-specific communication sounds is an important aspect of the auditory behavior of many animal species and is crucial for their social interactions, reproductive success, and survival. The principles of neural representations of these behaviorally important sounds in the cerebral cortex have direct implications for the neural mechanisms underlying human speech perception. Our progress in this area has been relatively slow, compared with our understanding of other auditory functions such as echolocation and sound localization. This article discusses previous and current studies in this field, with emphasis on nonhuman primates, and proposes a conceptual platform to further our exploration of this frontier. It is argued that the prerequisite condition for understanding cortical mechanisms underlying communication sound perception and production is an appropriate animal model. Three issues are central to this work: (i) neural encoding of statistical structure of communication sounds, (ii) the role of behavioral relevance in shaping cortical representations, and (iii) sensory-motor interactions between vocal production and perception systems.

PubMed Disclaimer

Figures

**Figure 1**
(A) A real-time recording of vocal exchanges between a pair of marmosets (one male, one female), shown in the form of a spectrogram. This typical vocal exchange contains trill and twitter calls. Call type and caller identity of each call are indicated. (B and C) Distribution of *phrase frequency* of twitter calls from a male (B, M346) and female (C, M403) marmoset. Both marmosets lived in the same colony and engaged in frequent vocal exchanges such as the example shown in A. A twitter call contains several discrete upward FM sweeps, each of which is referred to as a phrase. The intervals between phrases are relatively constant in each twitter call. Fourier analysis of a twitter call's envelope, obtained by using the Hilbert transform, revealed a local maximum reflecting the repetition frequency of the phrases. The frequency at this maximum is defined as the phrase frequency (27). (D) The results of a multidimensional clustering analysis of twitter calls of the pair of marmoset monkeys in B and C. Four parameters were calculated for each twitter call sample and used in the analysis. These parameters included (i) number of phrases, (ii) phrase frequency, (*iii*) spectral-peak frequency of the first phrase, and (iv) spectral-peak frequency of the second phrase. The spectral-peak frequency is computed from the magnitude spectrum of each phrase (27). Two ellipsoids are drawn by using the mean and standard deviation of a distance measure d_ij (see definition below). The open circle marks the mean distances of calls made by the male monkey (M346) with respect to its own group mean (abscissa) and the female monkey's group mean (ordinate); the open ellipsoid (male monkey, M346) outlines the standard deviations along both axes. The filled circle and shaded ellipsoid are calculations for calls from the female monkey M403. where i, j are animal designation (1: M346, 2: M403); N_i is number of call samples from ith animal (N₁ = 330, N₂ = 198); P_ik(n) is kth parameter in the nth call of the ith animal; and m̄_ik is mean value of the kth parameter of call samples from ith animal.

formula image — **Figure 1**
(A) A real-time recording of vocal exchanges between a pair of marmosets (one male, one female), shown in the form of a spectrogram. This typical vocal exchange contains trill and twitter calls. Call type and caller identity of each call are indicated. (B and C) Distribution of *phrase frequency* of twitter calls from a male (B, M346) and female (C, M403) marmoset. Both marmosets lived in the same colony and engaged in frequent vocal exchanges such as the example shown in A. A twitter call contains several discrete upward FM sweeps, each of which is referred to as a phrase. The intervals between phrases are relatively constant in each twitter call. Fourier analysis of a twitter call's envelope, obtained by using the Hilbert transform, revealed a local maximum reflecting the repetition frequency of the phrases. The frequency at this maximum is defined as the phrase frequency (27). (D) The results of a multidimensional clustering analysis of twitter calls of the pair of marmoset monkeys in B and C. Four parameters were calculated for each twitter call sample and used in the analysis. These parameters included (i) number of phrases, (ii) phrase frequency, (*iii*) spectral-peak frequency of the first phrase, and (iv) spectral-peak frequency of the second phrase. The spectral-peak frequency is computed from the magnitude spectrum of each phrase (27). Two ellipsoids are drawn by using the mean and standard deviation of a distance measure d_ij (see definition below). The open circle marks the mean distances of calls made by the male monkey (M346) with respect to its own group mean (abscissa) and the female monkey's group mean (ordinate); the open ellipsoid (male monkey, M346) outlines the standard deviations along both axes. The filled circle and shaded ellipsoid are calculations for calls from the female monkey M403. where i, j are animal designation (1: M346, 2: M403); N_i is number of call samples from ith animal (N₁ = 330, N₂ = 198); P_ik(n) is kth parameter in the nth call of the ith animal; and m̄_ik is mean value of the kth parameter of call samples from ith animal.

**Figure 2**
Averaged temporal discharge patterns of responses to a natural twitter call and its time-reversed version recorded from the primary auditory cortex of a marmoset (27). In each of 138 sampled units, its discharge rate to the natural twitter call was compared with that to the reversed twitter call. The sampled units were divided into two subpopulation based on this analysis. Units included in the *selective population* (A, B) responded more strongly to the natural twitter call than to the reversed twitter call, whereas the units included in the *nonselective population* (C, D) responded more strongly to the reversed twitter call than to the natural twitter call. In *A–D*, a mean poststimulus histogram (PSTH) is shown for each neuronal population under one of the two stimulus conditions (bin width = 2.0 ms).

**Figure 3**
Population representation of the spectral shape of marmoset vocalizations. Comparison is made between short-term call spectrum of one phrase of the twitter call and rate–CF (discharge rate vs. characteristic frequency) profiles computed over a corresponding time period. Data shown were obtained from the primary auditory cortex of one marmoset (27). (A) Magnitude spectrum of the first phrase of a natural twitter call. The magnitude spectrum of this call phrase in the time-reversed call is the same; only the phase spectrum is different. (B) Rate–CF profiles were constructed based on responses to a natural twitter call from 140 sampled units and were computed by using a triangular weighting window whose base was 0.25 octave wide. The centers of adjacent windows were 0.125 octave apart. Only averages that had at least 3 units in the window were included. Three profiles are shown, all units (n = 140, black solid line), selective subpopulation (n = 102, red solid line with triangle), and nonselective subpopulation (n = 38, green dashed line with circle). The definitions of the two subpopulation of units are given in Fig. 2. (C) Rate–CF profiles are shown for cortical responses to the same call phrase as analyzed in B but delivered in the time-reversed call. The same analytic method and display format are used as in B.

**Figure 4**
Comparison between the spectrotemporal acoustic pattern of marmoset vocalizations and the corresponding spectrotemporal discharge patterns recorded in the primary auditory cortex of marmosets. In each plot (*A–D*), *Upper* shows population responses to a vocalization, and *Lower* shows the corresponding spectrogram of the stimulus. Discharges as they occurred in time (abscissa) from individual cortical units are aligned along the ordinate according to their objectively defined CF. The display of discharges was based on PSTHs computed for each unit (bin width = 2.0 ms). All three vocalizations were delivered at the sound level of 60 dB SPL during the experiments. (A) Population responses to a marmoset *phee* call. An outline of the trajectory (solid line in red) of the call's time-varying spectral peak is drawn on *Upper* and *Lower* for comparison. (B) Population responses to a marmoset *multiple-phee* call. An outline of the trajectory (solid line in red) of the call's time-varying spectral peak is drawn on *Upper* and *Lower* for comparison. (C) Population responses to a marmoset *twitter* call. The second call phrase is indicated by a vertical arrowhead (red). (D) An expanded view of cortical responses to the second phrase of the twitter call shown in C (*Upper*) and the corresponding spectrogram of the second call phrase (*Lower*), with a time mark indicated by an arrowhead (red) as in C. Responses of the same group of cortical units shown in C are included but displayed in the form of dot raster. In *Upper*, each recorded spike occurrence within the time period shown is marked as a dot. Spike times from 10 repetitions in each unit are aligned along 10 lines centered at the CF of the unit, shifted by 10 Hz for each repetition for display purpose (i.e., positioned from CF −50 Hz to CF +40 Hz, in 10-Hz step).

See this image and copyright information in PMC

Cited by

Balance or imbalance: inhibitory circuits for direction selectivity in the auditory system.
Rabang CF, Lin J, Wu GK. Rabang CF, et al. Cell Mol Life Sci. 2015 May;72(10):1893-906. doi: 10.1007/s00018-015-1841-2. Epub 2015 Feb 1. Cell Mol Life Sci. 2015. PMID: 25638210 Free PMC article. Review.
Evolutionary continuity and divergence of auditory dorsal and ventral pathways in primates revealed by ultra-high field diffusion MRI.
Zhang Y, Shen SX, Bibic A, Wang X. Zhang Y, et al. Proc Natl Acad Sci U S A. 2024 Feb 27;121(9):e2313831121. doi: 10.1073/pnas.2313831121. Epub 2024 Feb 20. Proc Natl Acad Sci U S A. 2024. PMID: 38377216 Free PMC article.
An operant conditioning method for studying auditory behaviors in marmoset monkeys.
Remington ED, Osmanski MS, Wang X. Remington ED, et al. PLoS One. 2012;7(10):e47895. doi: 10.1371/journal.pone.0047895. Epub 2012 Oct 24. PLoS One. 2012. PMID: 23110123 Free PMC article.
The laminar and temporal structure of stimulus information in the phase of field potentials of auditory cortex.
Szymanski FD, Rabinowitz NC, Magri C, Panzeri S, Schnupp JW. Szymanski FD, et al. J Neurosci. 2011 Nov 2;31(44):15787-801. doi: 10.1523/JNEUROSCI.1416-11.2011. J Neurosci. 2011. PMID: 22049422 Free PMC article.
Development of inhibitory mechanisms underlying selectivity for the rate and direction of frequency-modulated sweeps in the auditory cortex.
Razak KA, Fuzessery ZM. Razak KA, et al. J Neurosci. 2007 Feb 14;27(7):1769-81. doi: 10.1523/JNEUROSCI.3851-06.2007. J Neurosci. 2007. PMID: 17301184 Free PMC article.

See all "Cited by" articles

References

1. Penfield W, Roberts L. Speech and Brain-Mechanisms. Princeton, NJ: Princeton Univ. Press; 1959.
1. Heffner H E, Heffner R S. J Neurophysiol. 1986;56:683–701. - PubMed
1. Brodmann K. Vergleichende Lokalisationslehre der Grobhirnrinde. Leipzig: Barth; 1909.
1. Jones E G, Powell T P S. Brain. 1970;93:793–820. - PubMed
1. Pandya D N, Yeterian E H. In: Cerebral Cortex. Peters A, Jones E G, editors. Vol. 4. New York: Plenum; 1985. pp. 3–61.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

On cortical coding of vocal communication sounds in primates

Affiliation

On cortical coding of vocal communication sounds in primates

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources