Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 25:5:82.
doi: 10.3389/fnhum.2011.00082. eCollection 2011.

Speech production as state feedback control

Affiliations

Speech production as state feedback control

John F Houde et al. Front Hum Neurosci. .

Abstract

Spoken language exists because of a remarkable neural process. Inside a speaker's brain, an intended message gives rise to neural signals activating the muscles of the vocal tract. The process is remarkable because these muscles are activated in just the right way that the vocal tract produces sounds a listener understands as the intended message. What is the best approach to understanding the neural substrate of this crucial motor control process? One of the key recent modeling developments in neuroscience has been the use of state feedback control (SFC) theory to explain the role of the CNS in motor control. SFC postulates that the CNS controls motor output by (1) estimating the current dynamic state of the thing (e.g., arm) being controlled, and (2) generating controls based on this estimated state. SFC has successfully predicted a great range of non-speech motor phenomena, but as yet has not received attention in the speech motor control community. Here, we review some of the key characteristics of speech motor control and what they say about the role of the CNS in the process. We then discuss prior efforts to model the role of CNS in speech motor control, and argue that these models have inherent limitations - limitations that are overcome by an SFC model of speech motor control which we describe. We conclude by discussing a plausible neural substrate of our model.

Keywords: models of neural processes; models of speech production; sensory feedback; speech motor control; speech neurophysiology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of DIVA. This diagram differs from published diagrams of the model that show both the auditory and somatosensory feedback control subsystems (Guenther et al., 2006); here for simplicity, we show only one generic feedback control subsystem and sensory cortex that represents two similar subsystems (auditory and somatosensory) with different but analogous anatomical substrates. In addition, here we focus on the operation of the feedback control subsystem (red). This discrete-time depiction shows the model at time t, when desired articulatory position ut−1 has previously been applied to the vocal tract, causing it to produce sensory feedback yt. This sensory feedback is seen N msec later in the sensory cortices as yt−N, where it is compared with the target representation y^tN. If yt−N strays outside the bounds of y^tN, a non-zero sensory feedback error y˜tN is generated in the feedback control subsystem (red), which is converted to an articulatory position error ΔMfb(t) by the learned inverse Jacobian J−1(ut−1), and added in to the feedback control subsystem's contribution (weighted by αfb) to the next desired articulatory position ut to be applied to the vocal tract. Task-dependent modulation of the control system (blue) is provided by speech sound units Pt in frontal cortex making time-varying synapses ZPM(t) and ZPy(t) onto units in the motor and high order sensory cortices, respectively. [Note: Until recently, the DIVA model had the sensory cortices linked directly to motor cortex, where all operations of the feedback control subsystem occurred. However, recent neuroimaging work (Tourville et al., 2008) has shown that right ventral premotor cortex (vPMC) appears to be in at least the auditory part of this subsystem (Guenther, 2008).]
Figure 2
Figure 2
The control problem in speech motor control. The figure shows a snapshot at time t, when the vocal tract has produced output yt in response to the previously applied control ut−1.
Figure 3
Figure 3
Ideal state feedback control. If the controller in the CNS had access to the full internal state xt of the vocal tract system (red path), it could ignore feedback yt−N and formulate a state feedback control law Ut(xt) that would optimally guide the vocal tract articulators to produce the desired speech output yt. However, as discussed in the text, the internal vocal tract state xt is, by definition, not directly available.
Figure 4
Figure 4
A more realizable model of state feedback control based on an estimate x^t of the true internal vocal tract state xt. If the CNS had had an internal model of the vocal tract, vocal tract^ (comprised of dynamics model vtdyn^(ut1,x^t1) and sensory feedback model vtout^(x^t)), it could send efference copy (green path) of vocal tract controls ut−1 to the internal model, whose state x^t is accessible and could be used as in place of xt in the controller's feedback control law Ut(x^) (red path). However, this scheme only works if x^t always closely tracks xt, which is not a realistic assumption.
Figure 5
Figure 5
State feedback control (SFC) model of speech motor control. The model is similar to that depicted in Figure 4 (i.e., the forward models vtdyn^(ut1,x^t1) and vtout^(x^t) constitute the internal model of the vocal tract vocal tract^ shown in Figure 4), but here sensory feedback is used to keep the state estimate x^t tracking the true vocal tract state xt. This is accomplished with a prediction/correction process in which, in the prediction (green) direction, efference copy of vocal motor commands ut−1 are passed through dynamics model vtdyn^(ut1,x^t1) to generate next state prediction x^t|t1, which is delayed by zN^. zN^ outputs the next state prediction x^(t|t1)N^ from N^ seconds ago, in order to match the sensory transduction delay of N seconds. x^(t|t1)N^ is passed through sensory feedback model vtout^(x^t) to generate feedback prediction y^tN^. Then, in the correction (red) direction, incoming sensory feedback ytN is compared with prediction y^tN^, resulting in sensory feedback prediction error y˜tN^. y˜tN^ is converted by Kalman gain function Kt(y˜). into state correction e^t, which is added to x^t|t1 to make corrected state estimate x^t. Finally, as in Figure 4, x^t is used by state feedback control law Ut(x^t) in the controller to generate the controls ut that will be applied at the next timestep to the vocal tract.
Figure 6
Figure 6
State feedback control (SFC) model of speech motor control with putative neural substrate. The figure depicts the same operations as those shown in Figure 5, but with suggested cortical locations of the operations (motor areas are in yellow, while sensory areas are in pink). The current model is largely agnostic regarding hemispheric specialization for these operations. Also, for diagrammatic simplicity, the operations in the auditory and somatosensory cortices are depicted in the single area marked “sensory cortex,” with the understanding that it represents analogous operations occurring in both of these sensory cortices: i.e., the delayed state estimate x^(t|t1)N^ is sent to both high order somatosensory and auditory cortex, each with separate feedback prediction modules (vtout^(x^t) for predicting auditory feedback in high order auditory cortex and vtout^(x^t) for predicting somatosensory feedback in high order somatosensory cortex. The feedback prediction errors y˜tN^ generated in auditory and somatosensory cortex are converted into separate state corrections e^t based on auditory and somatosensory feedback by auditory and somatosensory Kalman gain functions Kt(y˜). in high the order auditory and somatosensory cortices, respectively. The auditory- and somatosensory-based state corrections are then added to x^t|t1 in premotor cortex to make next state estimate x^t. Finally, the key operations depicted in blue are all postulated to be modulated by the current speech task goals (e.g., what speech sound is currently meant to be produced) that are expressed in other areas of frontal cortex.
Figure 7
Figure 7
Cortical substrate of SFC model. (A) Anatomical locations of candidate cortical areas and white matter tracts comprising network of the core SFC model. The same color scheme used in Figure 6 is used here: motor areas are in yellow, while sensory areas are in pink; connections conveying predictive information are in green, while those conveying corrective information are in red. Here, however, the single depiction of sensory cortex made up of primary and higher-level areas shown in Figure 6 is shown here in more detail as a parallel organization of primary (A1, S1) and higher-level (Spt/PT, S2/PV) auditory and somatosensory cortices. The main white matter tracts that bidirectionally connect premotor cortex with the higher auditory and somatosensory cortices are hypothesized to be the arcuate and longitudinal fasiculi, respectively. Note that although, for simplicity, only the neural substrate in the left hemisphere is shown here, we would expect the full network of the neural substrate to include analogous areas in the right hemisphere as well. At this point, the SFC model is agnostic regarding hemispheric dominance in the proposed neural substrate. (B) Cortical connections in the prediction (green) direction: Efference copy of the neuromuscular controls ut−1 generated in motor cortex (M1) and sent to the vocal tract motor neurons are also sent to premotor cortex (vPMC), which uses this to generate state prediction x^(t|t1)N that it sends to both the high level auditory (Spt/PT) and somatosensory (S2/PV) cortices. These higher-level sensory areas in turn use x^(t|t1)N^ to generate feedback predictions y˜tN, which they send to their associated primary sensory areas (A1,S1), where these predictions are compared with incoming feedback. (C) Cortical connections in the correction (red) direction: By comparing feedback predictions with incoming feedback, the primary sensory areas (A1,S1) compute feedback prediction errors y˜tN that are sent back from the to the higher-level sensory areas (Spt/PT, S2/PV), where they are converted into state estimate corrections e^t that are sent back to premotor cortex (vPMC). Finally, in premotor cortex these corrections are added to the state prediction, making the corrected state estimate x^t sent back to motor cortex (M1), which uses x^t along with current task goals to generate further neuromuscular commands sent to the vocal tract motor neurons.

References

    1. Abbs J. H., Gracco V. L. (1983). Sensorimotor actions in the control of multi-movement speech gestures. Trends Neurosci. 6, 391.10.1016/0166-2236(83)90173-X - DOI
    1. Abbs J. H., Gracco V. L. (1984). Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J. Neurophysiol. 51, 705–723 - PubMed
    1. Ackermann H., Riecker A. (2004). The contribution of the insula to motor aspects of speech production: a review and a hypothesis. Brain Lang. 89, 320–32810.1016/S0093-934X(03)00347-X - DOI - PubMed
    1. Andersen R. A. (1997). Multimodal integration for the representation of space in the posterior parietal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci., 352, 1421–142810.1098/rstb.1997.0128 - DOI - PMC - PubMed
    1. Arbib M. A. (1981). “Perceptual structures and distributed motor control,” in Handbook of Physiology, Section 1: The Nervous System, Volume 2: Motor Control, Part 2, eds Brookhart J. M., Mountcastle V. B., Brooks V. B. (Bethesda, MD: American Phsyiological Society; ), 1449–1480