Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Mar 12;363(1493):1087-104.
doi: 10.1098/rstb.2007.2161.

Neural specializations for speech and pitch: moving beyond the dichotomies

Affiliations
Review

Neural specializations for speech and pitch: moving beyond the dichotomies

Robert J Zatorre et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

The idea that speech processing relies on unique, encapsulated, domain-specific mechanisms has been around for some time. Another well-known idea, often espoused as being in opposition to the first proposal, is that processing of speech sounds entails general-purpose neural mechanisms sensitive to the acoustic features that are present in speech. Here, we suggest that these dichotomous views need not be mutually exclusive. Specifically, there is now extensive evidence that spectral and temporal acoustical properties predict the relative specialization of right and left auditory cortices, and that this is a parsimonious way to account not only for the processing of speech sounds, but also for non-speech sounds such as musical tones. We also point out that there is equally compelling evidence that neural responses elicited by speech sounds can differ depending on more abstract, linguistically relevant properties of a stimulus (such as whether it forms part of one's language or not). Tonal languages provide a particularly valuable window to understand the interplay between these processes. The key to reconciling these phenomena probably lies in understanding the interactions between afferent pathways that carry stimulus information, with top-down processing mechanisms that modulate these processes. Although we are still far from the point of having a complete picture, we argue that moving forward will require us to abandon the dichotomy argument in favour of a more integrated approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hemispheric differences in auditory cortex elicited by noise stimuli. (a,b) Illustration of how noise stimuli were constructed; each matrix illustrates stimuli with different bandwidths (on the ordinate) and different temporal modulation rates (on the abscissa). (c) fMRI results indicating bilateral recruitment of distinct cortical areas for increasing rate of temporal or spectral modulation. (d) Effect sizes in selected areas of right (r) and left (l) auditory cortices. Note significant interaction between left anterolateral region, which responds more to temporal than to spectral modulation, and right anterolateral region, which responds more to spectral than to temporal modulation. HG, Heschl's gyrus (Schönwiesner et al. 2005).
Figure 2
Figure 2
(a) Illustrations of consonant–vowel (CV) speech and non-speech sounds containing similar acoustical properties. (b) fMRI brain images illustrating an overlap between left auditory cortex responses to syllables and to non-speech gap stimuli in left auditory cortex. Similar results are obtained at two distinct time points (TPA1 and TPA2; Zaehle et al. 2004).
Figure 3
Figure 3
Magnetic-evoked potentials to a non-speech sound presented in different contexts. A left-hemisphere advantage is observed only when the target sound is embedded in (a,b) a real word (verb or noun, respectively), but not when it is presented in either (c) a non-speech context or (d) a pseudoword context, where it is perceived as speech but not as a recognizable word (Shtyrov et al. 2005).
Figure 4
Figure 4
Cortical activation maps comparing discrimination of pitch and duration patterns in a speech relative to a non-speech condition for groups of Thai and Chinese listeners. (a) Spectral contrast between Thai tones (speech) and pitch (non-speech) and (b) temporal contrast between Thai vowel length (speech) and duration (non-speech). Left-sided activation foci in frontal and temporo-occipital regions occur in the Thai group only. T, tone; P, pitch; VL, vowel length; D, duration (Gandour et al. 2002a,b).
Figure 5
Figure 5
Cortical areas activated in response to discrimination of Chinese and Thai tones. A common focus of activation is indicated by the overlap (yellow) between Chinese and Thai groups in the functional activation maps. Green cross-hair lines mark the stereotactic centre coordinates for the overlapping region in the left planum temporale ((a) coronal section, y=−25; (b) sagittal section, x=−44; (c) axial section, z=+7). A double dissociation ((d) bar charts) between tonal processing and language experience reveals that for the Thai group, Thai tones elicit stronger activity relative to Chinese tones, whereas for the Chinese group, stronger activity is elicited by Chinese tones relative to Thai tones. CC, Chinese tones superimposed on Chinese, i.e. Chinese words; CT, Thai tones superimposed on Chinese syllable, i.e. tonal chimeras; L/LH, left hemisphere; R/RH, right hemisphere; ROI, region of interest (Xu et al. 2006).
Figure 6
Figure 6
Laterality effects for ROIs in (a) the Chinese group only and in (b) both Chinese and English groups as rendered on a three-dimensional LH template for common reference. (a) In the Chinese group, the ventral aspects of the inferior parietal lobule and anterior and posterior STG are lateralized to the LH across tone and intonation tasks; the anterior MFG and intraparietal sulcus for a subset of tasks. (b) In both groups, the middle portions of the MFG and STS are lateralized to the RH across tasks. This right-sided fronto-temporal network subserves pitch processing regardless of its linguistic function. Other ROIs do not show laterality effects. MFG, middle frontal gyrus; STG, superior temporal gyrus (Gandour et al. 2004).
Figure 7
Figure 7
Grand-average f0 contours of Mandarin tone 2 derived from the FFR waveforms of all subjects across both ears in the Chinese and English groups. The f0 contour of the original speech stimulus is displayed in black. The enlarged inset shows that the f0 contour derived from the FFR waveforms of the Chinese group more closely approximates that of the original stimulus (yi2 ‘aunt’) when compared with the English group. (Krishnan et al. 2005).

References

    1. Bavelier D, Corina D.P, Neville H.J. Brain and language: a perspective from sign language. Neuron. 1998;21:275–278. doi:10.1016/S0896-6273(00)80536-X - DOI - PubMed
    1. Belin P, Zilbovicius M, Crozier S, Thivard L, Fontaine A, Masure M.-C, Samson Y. Lateralization of speech and auditory temporal processing. J. Cogn. Neurosci. 1998;10:536–540. doi:10.1162/089892998562834 - DOI - PubMed
    1. Binder J, Frost J, Hammeke T, Cox R, Rao S, Prieto T. Human brain language areas identified by functional magnetic resonance imaging. J. Neurosci. 1997;17:353–362. - PMC - PubMed
    1. Binder J, Frost J, Hammeke T, Bellgowan P, Springer J, Kaufman J, Possing J. Human temporal lobe activation by speech and nonspeech sounds. Cereb. Cortex. 2000;10:512–528. doi:10.1093/cercor/10.5.512 - DOI - PubMed
    1. Blumstein S.E. The neurobiology of the sound structure of language. In: Gazzaniga M.S, editor. The cognitive neurosciences. MIT Press; Cambridge, MA: 1994. pp. 915–929.

Publication types

LinkOut - more resources