Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2006 Dec 29;361(1476):2091-107.
doi: 10.1098/rstb.2006.1933.

Voice processing in human and non-human primates

Affiliations
Review

Voice processing in human and non-human primates

Pascal Belin. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Humans share with non-human primates a number of voice perception abilities of crucial importance in social interactions, such as the ability to identify a conspecific individual from its vocalizations. Speech perception is likely to have evolved in our ancestors on the basis of pre-existing neural mechanisms involved in extracting behaviourally relevant information from conspecific vocalizations (CVs). Studying the neural bases of voice perception in primates thus not only has the potential to shed light on cerebral mechanisms that may be--unlike those involved in speech perception--directly homologous between species, but also has direct implications for our understanding of how speech appeared in humans. In this comparative review, we focus on behavioural and neurobiological evidence relative to two issues central to voice perception in human and non-human primates: (i) are CVs 'special', i.e. are they analysed using dedicated cerebral mechanisms not used for other sound categories, and (ii) to what extent and using what neural mechanisms do primates identify conspecific individuals from their vocalizations?

PubMed Disclaimer

Figures

Figure 1
Figure 1
Voice production mechanism in primates. (a) Sagittal views depicting vocal tract anatomy in an (i) orang-utan, (ii) a chimpanzee and (iii) a human. Red colour, the tongue body; yellow, the larynx; blue, the air sacs (apes only). Note the longer oral cavity and much lower larynx in the humans, with concomitant distortion of tongue shape compared with orang-utans and chimpanzees. These differences allow a much greater range of sounds to be produced by humans, which would have been significant in the evolution of speech (Fitch 2000). Adapted with permission from Fitch (2000). (b) The source/filter theory. The source/filter theory of vocal production, originally proposed for speech, appears to apply to vocal production in all mammals studied so far. The theory holds that vocalizations result from a sound source (typically produced at the larynx) combined with a vocal tract filter (which consists of a number of formants). This filtering action applies regardless of the type(s) of sound produced at the larynx. Reproduced with permission from Fitch (2000). (c)–(e) Spectrograms (0–5500 Hz) of a rhesus coo (c), a chimp pan-hoot excerpt (d) and human speech (e). Note the similarities in structure, with harmonics and formants visible in each case.
Figure 2
Figure 2
STS voice-selective areas in humans. (a) Spectrograms (0–5000 Hz) of examples of (i) non-vocal and (ii) vocal sounds used by Belin et al. (2000). Note their similar apparent complexity. (b) Cortical rendering of regions showing greater response to vocal compared with non-vocal sounds in eight subjects, located in the anterior part of the STS. Reproduced with permission from Belin et al. (2004).
Figure 3
Figure 3
Cortical sensitivity to vocal identity. (a) Spectrograms (0–5000 Hz) of examples of auditory blocks used by Belin & Zatorre (2003). Adapt-speaker: different syllables spoken by a same speaker. Adapt-syllable: a same syllable spoken by several different speakers. (b) Cortical regions showing decrease in neuronal activity with repetition of the speaker's voice, shown in colour scale on axial (top) and sagittal (middle) slices through the subjects’ mean anatomical image. Reproduced with permission from Belin & Zatorre (2003).

References

    1. Abberton E, Fourcin A. Intonation and speaker identification. Lang. Speech. 1978;21:305–318. - PubMed
    1. Aitkin L.M, Merzenich M.M, Irvine D.R, Clarey J.C, Nelson J.E. Frequency representation in auditory cortex of the common marmoset (Callithrix jacchus jacchus) J. Comp. Neurol. 1986;252:175–185. doi:10.1002/cne.902520204 - DOI - PubMed
    1. Allison T, Puce A, McCarthy G. Social perception from visual cues: role of the STS region. Trends Cogn. Sci. 2000;4:267–278. doi:10.1016/S1364-6613(00)01501-1 - DOI - PubMed
    1. Andrews M.L, Schmidt C.P. Gender presentation: perceptual and acoustical analyses of voice. J. Voice. 1997;11:307–313. doi:10.1016/S0892-1997(97)80009-4 - DOI - PubMed
    1. Assal G, Zander E, Kremin H, Buttet J. Discrimination des voix lors des lesions du cortex cerebral. Arch. Suisses Neurol. Neurochir. Psychiatry. 1976;119:307–315. - PubMed

LinkOut - more resources