Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Jun;12(6):718-24.
doi: 10.1038/nn.2331. Epub 2009 May 26.

Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing

Affiliations
Review

Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing

Josef P Rauschecker et al. Nat Neurosci. 2009 Jun.

Abstract

Speech and language are considered uniquely human abilities: animals have communication systems, but they do not match human linguistic skills in terms of recursive structure and combinatorial power. Yet, in evolution, spoken language must have emerged from neural mechanisms at least partially available in animals. In this paper, we will demonstrate how our understanding of speech perception, one important facet of language, has profited from findings and theory in nonhuman primate studies. Chief among these are physiological and anatomical studies showing that primate auditory cortex, across species, shows patterns of hierarchical structure, topographic mapping and streams of functional processing. We will identify roles for different cortical areas in the perceptual processing of speech and review functional imaging work in humans that bears on our understanding of how the brain decodes and monitors speech. A new model connects structures in the temporal, frontal and parietal lobes linking speech perception and production.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Dual processing scheme for ‘what’ and ‘where’, proposed for nonhuman primates on anatomical and physiological grounds. V1, primary visual cortex; A1, primary auditory cortex; IT, inferior temporal region; ST, superior temporal region; PPC, posterior parietal cortex; VLPFC, ventrolateral prefrontal cortex; DLPFC, dorsolateral prefrontal cortex. (Simplified from refs. 4, and combined with an existing scheme from the visual system from ref. 6.)
Figure 2
Figure 2
Communication calls consist of elementary features, such as bandpass noise bursts or frequency-modulated (FM) sweeps. Harmonic calls, such as the vocal scream from the rhesus monkey repertoire depicted here by its spectrogram and time signal amplitude (A, measured as output voltage of a sound meter), consist of fundamental frequencies and higher harmonics. The neural circuitry for processing such calls is thought to consist of small hierarchical networks. At the lowest level, there are neurons serving as FM detectors tuned to the rate and direction of FM sweeps; these detectors extract each FM component (shown in cartoon spectrograms) in the upward and downward sweeps of the scream. The output of these FM detectors is combined nonlinearly at the next level: the target neurons T1 and T2 possess a high threshold and fire only if all inputs are activated. At the final level, a ‘tonal-scream detector’ is created by again combining output from neurons T1 and T2 nonlinearly. Temporal integration is accomplished by having the output of T1 pass through a delay line with a latency Δt1 sufficient to hold up the input to the top neuron long enough that all inputs arrive at the same time. Early processing of human speech sounds in the antero-lateral auditory belt and parabelt cortex is thought to be accomplished in a similar way.
Figure 3
Figure 3
Multiple parallel input modules advocated by some as an alternative to the dual-stream model. According to this model, sensory information at the cortical level originates from primary-like areas (A1 and R in the auditory system; R is also referred to as A2 by analogy to visual area V2) and splits into multiple early processing streams: an object stream (green) originating from the antero-lateral belt (AL; or “A4” by analogy to area V4, involved in processing visual form); a spatial stream (red) originating from the caudo-lateral belt (CL; or “A5” by analogy to visual motion area V5); and other streams or streamlets originating from either area ML between AL and CL (“A3” by analogy to visual area V3) or from the medial belt (MB). RPB and CPB, rostral and caudal parabelt; T2 and T3, temporal cortical areas as defined by Burton and Jones; TPO, polymodal cortex in the upper bank of superior temporal sulcus; Tpt, parieto-temporal area.
Figure 4
Figure 4
Invariance in the perception of auditory objects (including vocalizations and speech) against transpositions in frequency, time or both. (a) Frequency-shifted monkey calls are behaviorally classified as the same by monkeys, presumably reflecting the response of higher-order neurons in anterior superior temporal cortex, even though the frequency contents of the monkey calls are markedly different. The example shows spectrograms of a tonal scream from a rhesus monkey frequency-shifted in steps of one octave. (b) Spectrograms of clear human speech (top) and of a six-channel noise-vocoded transformation of it (bottom). The noise-vocoded version of the sentence (“They're buying some bread.”) is easily comprehensible after short training, even though the sound is very impoverished in the spectral domain.
Figure 5
Figure 5
Dual auditory processing scheme of the human brain and the role of internal models in sensory systems. This expanded scheme closes the loop between speech perception and production and proposes a common computational structure for space processing and speech control in the postero-dorsal auditory stream. (a) Antero-ventral (green) and postero-dorsal (red) streams originating from the auditory belt. The postero-dorsal stream interfaces with premotor areas and pivots around inferior parietal cortex, where a quick sketch of sensory event information is compared with a predictive efference copy of motor plans. (b) In one direction, the model performs a forward mapping: object information, such as speech, is decoded in the antero-ventral stream all the way to category-invariant inferior frontal cortex (area 45), and is transformed into motor-articulatory representations (area 44 and ventral PMC), whose activation is transmitted to the IPL (and posterior superior temporal cortex) as an efference copy. (c) In reverse direction, the model performs an inverse mapping, whereby attention- or intention-related changes in the IPL, influence the selection of context-dependent action programs in PFC and PMC. Both types of dynamic model are testable using techniques with high temporal precision (for example, magnetoencephalography in humans or single-unit studies in monkeys) that allow determination of the order of events in the respective neural systems. AC, auditory cortex; STS, superior temporal sulcus; IFC, inferior frontal cortex, PMC, premotor cortex; IPL, inferior parietal lobule; CS, central sulcus. Numbers correspond to Brodmann areas.

Similar articles

Cited by

References

    1. Broca P. Remarques sur le siège de la facultè du language articulè: suivies d'une observation d'aphèmie (perte de la parole) Bull Soc Anat Paris. 1861;6:330–357.
    1. Wernicke C. Der aphasische Symptomencomplex: Eine psychologische Studie auf anatomischer Basis. Cohn & Weigert, Breslau; Germany: 1874.
    1. Wise RJ. Language systems in normal and aphasic human subjects: functional imaging studies and inferences from animal studies. Br Med Bull. 2003;65:95–119. - PubMed
    1. Rauschecker JP. Cortical processing of complex sounds. Curr Opin Neurobiol. 1998;8:516–521. - PubMed
    1. Rauschecker JP, Tian B. Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci USA. 2000;97:11800–11806. - PMC - PubMed

Publication types