Speech production as state feedback control

John F Houde¹, Srikantan S Nagarajan

Affiliations

PMID: 22046152
PMCID: PMC3200525
DOI: 10.3389/fnhum.2011.00082

Speech production as state feedback control

John F Houde et al. Front Hum Neurosci. 2011.

. 2011 Oct 25:5:82.

doi: 10.3389/fnhum.2011.00082. eCollection 2011.

Authors

John F Houde¹, Srikantan S Nagarajan

Affiliation

¹ Department of Otolaryngology - Head and Neck Surgery, University of California San Francisco San Francisco, CA, USA.

PMID: 22046152
PMCID: PMC3200525
DOI: 10.3389/fnhum.2011.00082

Abstract

Spoken language exists because of a remarkable neural process. Inside a speaker's brain, an intended message gives rise to neural signals activating the muscles of the vocal tract. The process is remarkable because these muscles are activated in just the right way that the vocal tract produces sounds a listener understands as the intended message. What is the best approach to understanding the neural substrate of this crucial motor control process? One of the key recent modeling developments in neuroscience has been the use of state feedback control (SFC) theory to explain the role of the CNS in motor control. SFC postulates that the CNS controls motor output by (1) estimating the current dynamic state of the thing (e.g., arm) being controlled, and (2) generating controls based on this estimated state. SFC has successfully predicted a great range of non-speech motor phenomena, but as yet has not received attention in the speech motor control community. Here, we review some of the key characteristics of speech motor control and what they say about the role of the CNS in the process. We then discuss prior efforts to model the role of CNS in speech motor control, and argue that these models have inherent limitations - limitations that are overcome by an SFC model of speech motor control which we describe. We conclude by discussing a plausible neural substrate of our model.

Keywords: models of neural processes; models of speech production; sensory feedback; speech motor control; speech neurophysiology.

PubMed Disclaimer

Figures

**Figure 1**
**Schematic of DIVA**. This diagram differs from published diagrams of the model that show both the auditory and somatosensory feedback control subsystems (Guenther et al., 2006); here for simplicity, we show only one generic feedback control subsystem and sensory cortex that represents two similar subsystems (auditory and somatosensory) with different but analogous anatomical substrates. In addition, here we focus on the operation of the feedback control subsystem (red). This discrete-time depiction shows the model at time t, when desired articulatory position u_t−1 has previously been applied to the vocal tract, causing it to produce sensory feedback y_t. This sensory feedback is seen N msec later in the sensory cortices as y_t−N, where it is compared with the target representation ${\hat{y}}_{t - N}$ . If y_t−N strays outside the bounds of ${\hat{y}}_{t - N}$ , a non-zero sensory feedback error ${\tilde{y}}_{t - N}$ is generated in the feedback control subsystem (red), which is converted to an articulatory position error ΔM_fb(t) by the learned inverse Jacobian J⁻¹(u_t−1), and added in to the feedback control subsystem's contribution (weighted by α_fb) to the next desired articulatory position u_t to be applied to the vocal tract. Task-dependent modulation of the control system (blue) is provided by speech sound units P_t in frontal cortex making time-varying synapses Z_PM(t) and Z_Py(t) onto units in the motor and high order sensory cortices, respectively. [Note: Until recently, the DIVA model had the sensory cortices linked directly to motor cortex, where all operations of the feedback control subsystem occurred. However, recent neuroimaging work (Tourville et al., 2008) has shown that right ventral premotor cortex (vPMC) appears to be in at least the auditory part of this subsystem (Guenther, 2008).]

**Figure 3**
**Ideal state feedback control**. If the controller in the CNS had access to the full internal state x_t of the vocal tract system (red path), it could ignore feedback y_t−N and formulate a state feedback control law U_t(x_t) that would optimally guide the vocal tract articulators to produce the desired speech output y_t. However, as discussed in the text, the internal vocal tract state x_t is, by definition, not directly available.

**Figure 4**
**A more realizable model of state feedback control based on an estimate ${\hat{x}}_{t}$ of the true internal vocal tract state xt**. If the CNS had had an internal model of the vocal tract, $\hat{vocal tract}$ (comprised of dynamics model $\hat{vtdyn} (u_{t - 1}, {\hat{x}}_{t - 1})$ and sensory feedback model $\hat{vtout} ({\hat{x}}_{t})$ ), it could send efference copy (green path) of vocal tract controls u_t−1 to the internal model, whose state ${\hat{x}}_{t}$ is accessible and could be used as in place of x_t in the controller's feedback control law $U_{t} (\hat{x})$ (red path). However, this scheme only works if ${\hat{x}}_{t}$ always closely tracks x_t, which is not a realistic assumption.

**Figure 5**
**State feedback control (SFC) model of speech motor control**. The model is similar to that depicted in Figure 4 (i.e., the forward models $\hat{vtdyn} (u_{t - 1}, {\hat{x}}_{t - 1})$ and $\hat{vtout} ({\hat{x}}_{t})$ constitute the internal model of the vocal tract $\hat{vocal tract}$ shown in Figure 4), but here sensory feedback is used to keep the state estimate ${\hat{x}}_{t}$ tracking the true vocal tract state x_t. This is accomplished with a prediction/correction process in which, in the prediction (green) direction, efference copy of vocal motor commands u_t−1 are passed through dynamics model $\hat{vtdyn} (u_{t - 1}, {\hat{x}}_{t - 1})$ to generate next state prediction ${\hat{x}}_{t | t - 1}$ , which is delayed by $\begin{matrix} z^{- \hat{N}} \end{matrix}$ . $\begin{matrix} z^{- \hat{N}} \end{matrix}$ outputs the next state prediction ${\hat{x}}_{(t | t - 1) - \hat{N}}$ from $\hat{N}$ seconds ago, in order to match the sensory transduction delay of N seconds. ${\hat{x}}_{(t | t - 1) - \hat{N}}$ is passed through sensory feedback model $\hat{vtout} ({\hat{x}}_{t})$ to generate feedback prediction ${\hat{y}}_{t - \hat{N}}$ . Then, in the correction (red) direction, incoming sensory feedback y_t−N is compared with prediction ${\hat{y}}_{t - \hat{N}}$ , resulting in sensory feedback prediction error ${\tilde{y}}_{t - \hat{N}}$ . ${\tilde{y}}_{t - \hat{N}}$ is converted by Kalman gain function $K_{t} (\tilde{y}) .$ into state correction ${\hat{e}}_{t}$ , which is added to ${\hat{x}}_{t | t - 1}$ to make corrected state estimate ${\hat{x}}_{t}$ . Finally, as in Figure 4, ${\hat{x}}_{t}$ is used by state feedback control law $U_{t} ({\hat{x}}_{t})$ in the controller to generate the controls u_t that will be applied at the next timestep to the vocal tract.

**Figure 6**
**State feedback control (SFC) model of speech motor control with putative neural substrate**. The figure depicts the same operations as those shown in Figure 5, but with suggested cortical locations of the operations (motor areas are in yellow, while sensory areas are in pink). The current model is largely agnostic regarding hemispheric specialization for these operations. Also, for diagrammatic simplicity, the operations in the auditory and somatosensory cortices are depicted in the single area marked “sensory cortex,” with the understanding that it represents analogous operations occurring in both of these sensory cortices: i.e., the delayed state estimate ${\hat{x}}_{(t | t - 1) - \hat{N}}$ is sent to both high order somatosensory and auditory cortex, each with separate feedback prediction modules ( $\hat{vtout} ({\hat{x}}_{t})$ for predicting auditory feedback in high order auditory cortex and $\hat{vtout} ({\hat{x}}_{t})$ for predicting somatosensory feedback in high order somatosensory cortex. The feedback prediction errors ${\tilde{y}}_{t - \hat{N}}$ generated in auditory and somatosensory cortex are converted into separate state corrections ${\hat{e}}_{t}$ based on auditory and somatosensory feedback by auditory and somatosensory Kalman gain functions $K_{t} (\tilde{y}) .$ in high the order auditory and somatosensory cortices, respectively. The auditory- and somatosensory-based state corrections are then added to ${\hat{x}}_{t | t - 1}$ in premotor cortex to make next state estimate ${\hat{x}}_{t}$ . Finally, the key operations depicted in blue are all postulated to be modulated by the current speech task goals (e.g., what speech sound is currently meant to be produced) that are expressed in other areas of frontal cortex.

**Figure 7**
**Cortical substrate of SFC model**. **(A)** Anatomical locations of candidate cortical areas and white matter tracts comprising network of the core SFC model. The same color scheme used in Figure 6 is used here: motor areas are in yellow, while sensory areas are in pink; connections conveying predictive information are in green, while those conveying corrective information are in red. Here, however, the single depiction of sensory cortex made up of primary and higher-level areas shown in Figure 6 is shown here in more detail as a parallel organization of primary (A1, S1) and higher-level (Spt/PT, S2/PV) auditory and somatosensory cortices. The main white matter tracts that bidirectionally connect premotor cortex with the higher auditory and somatosensory cortices are hypothesized to be the arcuate and longitudinal fasiculi, respectively. Note that although, for simplicity, only the neural substrate in the left hemisphere is shown here, we would expect the full network of the neural substrate to include analogous areas in the right hemisphere as well. At this point, the SFC model is agnostic regarding hemispheric dominance in the proposed neural substrate. **(B)** Cortical connections in the prediction (green) direction: Efference copy of the neuromuscular controls ut₋₁ generated in motor cortex (M1) and sent to the vocal tract motor neurons are also sent to premotor cortex (vPMC), which uses this to generate state prediction ${\hat{x}}_{(t | t - 1) - N}$ that it sends to both the high level auditory (Spt/PT) and somatosensory (S2/PV) cortices. These higher-level sensory areas in turn use ${\hat{x}}_{(t | t - 1) - \hat{N}}$ to generate feedback predictions ${\tilde{y}}_{t - N}$ , which they send to their associated primary sensory areas (A1,S1), where these predictions are compared with incoming feedback. **(C)** Cortical connections in the correction (red) direction: By comparing feedback predictions with incoming feedback, the primary sensory areas (A1,S1) compute feedback prediction errors ${\tilde{y}}_{t - N}$ that are sent back from the to the higher-level sensory areas (Spt/PT, S2/PV), where they are converted into state estimate corrections ${\hat{e}}_{t}$ that are sent back to premotor cortex (vPMC). Finally, in premotor cortex these corrections are added to the state prediction, making the corrected state estimate ${\hat{x}}_{t}$ sent back to motor cortex (M1), which uses ${\hat{x}}_{t}$ along with current task goals to generate further neuromuscular commands sent to the vocal tract motor neurons.

See this image and copyright information in PMC

References

1. Abbs J. H., Gracco V. L. (1983). Sensorimotor actions in the control of multi-movement speech gestures. Trends Neurosci. 6, 391.10.1016/0166-2236(83)90173-X - DOI
1. Abbs J. H., Gracco V. L. (1984). Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J. Neurophysiol. 51, 705–723 - PubMed
1. Ackermann H., Riecker A. (2004). The contribution of the insula to motor aspects of speech production: a review and a hypothesis. Brain Lang. 89, 320–32810.1016/S0093-934X(03)00347-X - DOI - PubMed
1. Andersen R. A. (1997). Multimodal integration for the representation of space in the posterior parietal cortex. Philos. Trans. R. Soc. Lond. B Biol. Sci., 352, 1421–142810.1098/rstb.1997.0128 - DOI - PMC - PubMed
1. Arbib M. A. (1981). “Perceptual structures and distributed motor control,” in Handbook of Physiology, Section 1: The Nervous System, Volume 2: Motor Control, Part 2, eds Brookhart J. M., Mountcastle V. B., Brooks V. B. (Bethesda, MD: American Phsyiological Society; ), 1449–1480

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Speech production as state feedback control

Affiliation

Speech production as state feedback control

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources