Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011 Feb 10;69(3):407-22.
doi: 10.1016/j.neuron.2011.01.019.

Sensorimotor integration in speech processing: computational basis and neural organization

Affiliations
Review

Sensorimotor integration in speech processing: computational basis and neural organization

Gregory Hickok et al. Neuron. .

Abstract

Sensorimotor integration is an active domain of speech research and is characterized by two main ideas, that the auditory system is critically involved in speech production and that the motor system is critically involved in speech perception. Despite the complementarity of these ideas, there is little crosstalk between these literatures. We propose an integrative model of the speech-related "dorsal stream" in which sensorimotor interaction primarily supports speech production, in the form of a state feedback control architecture. A critical component of this control system is forward sensory prediction, which affords a natural mechanism for limited motor influence on perception, as recent perceptual research has suggested. Evidence shows that this influence is modulatory but not necessary for speech perception. The neuroanatomy of the proposed circuit is discussed as well as some probable clinical correlates including conduction aphasia, stuttering, and aspects of schizophrenia.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Models of speech processing
A. State feedback control (SFC) model of speech production. The vocal tract is controlled by a motor controller, or set of controller (Haruno et al., 2001). Motor commands issued to the vocal tract send an efferent copy to an internal model of the vocal tract. The internal model of the vocal tract maintains an estimate of the current dynamic state of vocal tract. The sensory consequences of an issued motor command are predicted by a function that translates the current dynamic state estimate of the vocal tract into an auditory representation. Predicted auditory consequences can be compared against both the intended and actual auditory targets of a motor command. Deviation between the predicted vs. intended/actual targets result in the generation of an error correction signal that feeds back into the internal model and ultimately to the motor controllers. See text for details. B. A psycholinguistic model of speech production. Although details vary, psycholinguistic models of speech production agree on a multistage process that includes minimally a lexical/conceptual system, a phonological system, and an articulatory process that generates the motor code to produce speech (Dell et al., 1997; Levelt et al., 1999). C. A neurolinguistic model of speech processing. Research from patients with language disorders have documented dissociations in the ability to access phonological codes for receptive and expressive speech, leading to the idea that phonological processes have separable but linked motor and sensory components (Jacquemot et al., 2007).
Figure 2
Figure 2. Location and functional properties of area Spt
A. Activation map for covert speech articulation (rehearsal of a set of nonwords), from (Hickok and Buchsbaum, 2003). B. Activation timecourse (fMRI signal amplitude) in Spt during a sensorimotor task for speech and music. A trial is composed of 3s of auditory stimulation followed by 15s covert rehearsal/humming of the heard stimulus followed by 3s seconds of auditory stimulation followed by 15s of rest. The two humps represent the sensory responses, the valley between the humps is the motor (covert rehearsal) response, and the baseline values at the onset and offset of the trial reflect resting activity levels. Note similar response to both speech and music. Adapted from (Hickok et al., 2003)C. Activation timecourse in Spt in three conditions, continuous speech (15s, blue curve), listen+rest (3s speech, 12s rest, red curve), and listen+covert rehearse (3s speech, 12s rehearse, green curve). The pattern of activity within Spt (inset) was found to be different for listening to speech compared to rehearsing speech assessed at the end of the continuous listen versus listen+rehearse conditions despite the lack of a significant signal amplitude difference at that time point. Adapted from (Hickok et al., 2009a). D. Activation timecourse in Spt in skilled pianists performing a sensorimotor task involving listening to novel melodies and then covertly humming them (blue curve) vs. listening to novel melodies and imagine playing them on a keyboard (red curve). This indicates that Spt is relatively selective for vocal tract actions. Reprinted with permission from (Hickok, 2009b).
Figure 3
Figure 3. Dual stream model of speech processing
The dual stream model (Hickok and Poeppel, 2000, 2004, 2007) holds that early stages of speech processing occurs bilaterally in auditory regions on the dorsal STG (spectrotemporal analysis; green) and STS (phonological access/representation; yellow), and then diverges into two broad streams: a temporal lobe ventral stream supports speech comprehension (lexical access and combinatorial processes; pink) whereas a strongly left dominant dorsal stream supports sensory-motor integration and involves structures at the parietal-temporal junction (Spt) and frontal lobe. The conceptual network (gray box) is assumed to be widely distributed throughout cortex. IFG, inferior frontal gyrus; ITS, inferior temporal sulcus; MTG, middle temporal gyrus; PM, premotor; Spt, Sylvian parietal-temporal; STG, superior temporal gyrus; STS, superior temporal sulcus Figure reprinted with permission from (Hickok and Poeppel, 2007).
Figure 4
Figure 4. An integrated state feedback control (SFC) model of speech production
Speech models derived from the feedback control, psycholinguistic, and neurolinguistic literatures are integrated into one framework, presented here. The architecture is fundamentally that of a SFC system with a controller, or set of controllers (Haruno et al., 2001), localized to primary motor cortex, which generates motor commands to the vocal tract and sends a corollary discharge to an internal model which makes forward predictions about both the dynamic state of the vocal tract and about the sensory consequences of those states. Deviations between predicted auditory states and the intended targets or actual sensory feedback generates an error signal that is used to correct and update the internal model of the vocal tract. The internal model of the vocal tract is instantiated as a “motor phonological system”, which corresponds to the neurolinguistically elucidated phonological output lexicon, and is localized to premotor cortex. Auditory targets and forward predictions of sensory consequences are encoded in the same network, namely the “auditory phonological system”, which corresponds to the neurolinguistically elucidated phonological input lexicon, and is localized to the STG/STS. Motor and auditory phonological systems are linked via an auditory-motor translation system, localized to area Spt. The system is activated via parallel inputs from the lexical-conceptual system to the motor and auditory phonological systems.
Figure 5
Figure 5
Top-down modulation of perceptual response functions. (A) A graph replicated qualitatively from figure 2 of Boynton (Boynton, 2005) illustrating attentional effects on sensory response functions based on a `feature-similarity gain model' (Martinez-Trujillo and Treue, 2004). The effects include enhancement of the responses to the attended features and suppression of the responses to the unattended features (red dash line curve vs. blue solid line curve as modulated vs. baseline). (B) Increased discrimination capacity. Inward-shift of the boundaries (vertical dashed lines) makes it more likely for other perceptual `channels' (green solid curves) to respond to stimuli with features different from the attended due to the sharpened response profile in the `attended channel'. (C) Enhancement of the perceptual selectivity between different features achieved by increases of the response to the attended and decreases of the response to the unattended when the features are significantly different from each other, and (D) For features similar to the focus of attention, the contrast between responses to attended and unattended features is also increased though both responses to attended and unattended are increased.
Figure 6
Figure 6
Dysfunctional states of SFC system for speech. A. Proposed source of the deficit in conduction aphasia: damage to the auditory-motor translation system. Input from the lexical conceptual system to motor and auditory phonological systems are unaffected allowing for fluent output and accurate activation of sensory targets. However, internal forward sensory predictions are not possible leading to an increase in error rate. Further, errors detected as a consequence of mismatches between sensory targets and actual sensory feedback cannot be used to correct motor commands. B. Proposed source of the dysfunction in stuttering: noisy auditory-motor translation. Motor commands result in sometimes inaccurate sensory predictions due to the noisy sensorimotor mapping which trigger error correction signals that are themselves noisy, further exacerbating the problem and resulting in stuttering.
Figure 6
Figure 6
Dysfunctional states of SFC system for speech. A. Proposed source of the deficit in conduction aphasia: damage to the auditory-motor translation system. Input from the lexical conceptual system to motor and auditory phonological systems are unaffected allowing for fluent output and accurate activation of sensory targets. However, internal forward sensory predictions are not possible leading to an increase in error rate. Further, errors detected as a consequence of mismatches between sensory targets and actual sensory feedback cannot be used to correct motor commands. B. Proposed source of the dysfunction in stuttering: noisy auditory-motor translation. Motor commands result in sometimes inaccurate sensory predictions due to the noisy sensorimotor mapping which trigger error correction signals that are themselves noisy, further exacerbating the problem and resulting in stuttering.

Comment in

References

    1. Aliu SO, Houde JF, Nagarajan SS. Motor-induced suppression of the auditory cortex. J Cogn Neurosci. 2009;21:791–802. - PMC - PubMed
    1. Andersen R. Multimodal integration for the representation of space in the posterior parietal cortex. Philos Trans R Soc Lond B Biol Sci. 1997;352:1421–1428. - PMC - PubMed
    1. Anderson JM, Gilmore R, Roper S, Crosson B, Bauer RM, Nadeau S, Beversdorf DQ, Cibula J, Rogish M, III, Kortencamp S, et al. Conduction aphasia and the arcuate fasciculus: A reexamination of the Wernicke-Geschwind model. Brain and Language. 1999;70:1–12. - PubMed
    1. Baldo JV, Klostermann EC, Dronkers NF. It's either a cook or a baker: patients with conduction aphasia get the gist but lose the trace. Brain Lang. 2008;105:134–140. - PubMed
    1. Benson DF, Sheremata WA, Bouchard R, Segarra JM, Price D, Geschwind N. Conduction aphasia: A clincopathological study. Archives of Neurology. 1973;28:339–346. - PubMed