Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;22(7):1504-29.
doi: 10.1162/jocn.2009.21306.

Neural representations and mechanisms for the performance of simple speech sequences

Affiliations

Neural representations and mechanisms for the performance of simple speech sequences

Jason W Bohland et al. J Cogn Neurosci. 2010 Jul.

Abstract

Speakers plan the phonological content of their utterances before their release as speech motor acts. Using a finite alphabet of learned phonemes and a relatively small number of syllable structures, speakers are able to rapidly plan and produce arbitrary syllable sequences that fall within the rules of their language. The class of computational models of sequence planning and performance termed competitive queuing models have followed K. S. Lashley [The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms in behavior (pp. 112-136). New York: Wiley, 1951] in assuming that inherently parallel neural representations underlie serial action, and this idea is increasingly supported by experimental evidence. In this article, we developed a neural model that extends the existing DIVA model of speech production in two complementary ways. The new model includes paired structure and content subsystems [cf. MacNeilage, P. F. The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21, 499-511, 1998 ] that provide parallel representations of a forthcoming speech plan as well as mechanisms for interfacing these phonological planning representations with learned sensorimotor programs to enable stepping through multisyllabic speech plans. On the basis of previous reports, the model's components are hypothesized to be localized to specific cortical and subcortical structures, including the left inferior frontal sulcus, the medial premotor cortex, the basal ganglia, and the thalamus. The new model, called gradient order DIVA, thus fills a void in current speech research by providing formal mechanistic hypotheses about both phonological and phonetic processes that are grounded by neuroanatomy and physiology. This framework also generates predictions that can be tested in future neuroimaging and clinical case studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Competitive Queuing (CQ) model architecture for the representation and performance of the letter sequence “diva.” The serial position of each letter is encoded by its strength of representation (height of bar) in the planning layer (top). The choice layer (bottom) realizes a competitive (winner take all) process which allows only the strongest input to remain active, in this case “d.” Upon selection of “d,” its representation in the planning layer would be suppressed, leaving “i” as the most active node. This entire process iterates through time, enabling performance of the entire letter sequence.
Figure 2
Figure 2
Schematic of the primary GODIVA model components and their hypothesized cortical and subcortical correlates. Lines with arrows represent excitatory pathways, and lines with filled circles inhibitory pathways. Lines with both arrowheads indicate that connectivity between these modules features top-down excitatory connections and bottom-up inhibitory connections. The inhibitory pathways shown in the cortical portion of the model are feedback pathways that suppress planning representations after their corresponding action has been taken.
Figure 3
Figure 3
Schematic illustration of the structure of the GODIVA model’s inferior frontal sulcus phonological content representation. The region is hypothesized to consist of a layer of plan cells (p; top) and a layer of choice cells (q; bottom), arranged into columns, each of which codes for a planned phoneme in a given syllable position. The plan cells are loaded in parallel from other cortical or cerebellar regions. Choice cells, whose input from plan cells is gated by a syllable position-specific signal from the anterior thalamus, undergo a winner-take-all process within each gated zone. The activation of a choice cell suppresses its corresponding plan cell. This process results in the activation of a phonological syllable in the IFS choice field that can activate potentially matching syllable motor programs in the Speech Sound Map. Choice cell activations can be suppressed upon the selection of a specific Speech Sound Map motor program.
Figure 4
Figure 4
Illustration of the layout of cells in the IFS phonological content representation. Both plan and choice layers in the region use the same representation; shown here is the plan layer, which has dynamics that allow multiple parallel items to be co-temporally active. The long axis in the IFS map corresponds to specific phonemes, and the short axis corresponds to abstract serial positions in a generic syllable template. Cells compete with one another through lateral inhibition along the long axis. This map illustrates an idealized plan that corresponds to the syllable sequence “go.di.və.” The height of the vertical bar at a particular entry in the map corresponds to a cell’s activation level. Note that entries in the schematic of the same color indicate these cells code for the same syllable position; in this representation, there are 3 active cells each in syllable positions 3 and 4 in the template, corresponding to three [CV] syllables.
Figure 5
Figure 5
Schematic illustration of the structure and function of model cells hypothesized to exist in the pre- SMA. This region consists of a layer of plan cells (top) and a layer of choice cells arranged into columns, each of which corresponds to the same abstract syllable frame. When a pre-SMA choice cell is activated (i.e. the forthcoming frame is chosen), the cell gives inputs to a chain of cells, each of which corresponds to a position within the abstract syllable frame. These cells fire rapidly and in order, according to the vertical arrow labeled “time” (bottom left). In this schematic, the first pre-SMA cortical column codes for the syllable frame type [CVC], and the second column codes for the frame type [VC]. Note that the inputs to caudate are aligned such that the [V] position in both cases gives input to the same caudate channel (corresponding to positional zone 4). Cell w gates the pre-SMA frame choice process.
Figure 6
Figure 6
Schematic illustration of “channel” architecture through the basal ganglia planning loop. Each channel corresponds to an abstract serial position in the generic syllable template. The modeled caudate consists of one projection neuron (b) and one inhibitory interneuron (b) in each channel. The channels compete via feedforward inhibition in the caudate. Caudate projection neurons give inhibitory projections to a modeled GPi cell (c). The GPi cell, in turn, inhibits the anterior thalamic cell d. The successful activation of a channel disinhibits its specific thalamic cell, which in turn “opens the gate” to a zone in the inferior frontal sulcus phonological choice layer through a multiplicative interaction.
Figure 7
Figure 7
Illustration of the functional architecture of the model’s Speech Sound Map module. Columns consisting of a plan cell and a choice cell code for specific phonetic targets (for phonemes and syllables). IFS phonological choice cells give input to SSM plan cells that contain the phoneme for which they code. System dynamics allow only one SSM choice cell to remain active at a time. SSM choice cells give strong inhibitory input (not shown for simplicity) back to IFS choice cells to quench their constituent phonemes following their activation.
Figure 8
Figure 8
An algorithmic summary of the steps that the GODIVA model takes to perform a syllable sequence.
Figure 9
Figure 9
Simulation result showing the production of the three syllable sequence “go.di.və.” In this simulation, each of the three syllables has a corresponding stored Speech Sound Map representation. Each plot shows time courses of cell activity in different model components. The x-axis in each plot is time, and the y-axis is activation level (both in arbitrary model units). The arrows in each plot indicate the onset of the external input at the start of the simulation. See text for details.
Figure 10
Figure 10
Simulation result showing the production of the syllable sequence “go.di.və.” using piece-wise sensorimotor programs. In this simulation, only the second syllable (“di”) has a corresponding representation in the Speech Sound Map. The model must perform the first and third syllables, therefore, by sequentially activating targets for the constituent phonemes in those syllables. Each plot shows time courses of cell activity in different model components. The x-axis in each plot is time, and the y-axis is activation level (both in arbitrary model units). The arrows in each plot indicate the onset of external input at the start of the simulation. See text for details.
Figure 11
Figure 11
Simulated results in the IFS zone 3 plan and choice cell layers for a simulation of the intended syllable sequence “go.di.və” with Gaussian noise added to IFS plan cells. The simulation was chosen from multiple stochastic versions to illustrate how the model can produce phoneme exchange errors that obey syllable position constraints (cf. MacKay, 1970). Because of noise, the plan representation for /v/ (blue) becomes greater than that for /d/ (red), and is thus selected as part of the second syllable in the sequence. The plan for /d/ remains active and is chosen as the onset of the third syllable. Thus, the model produces the sequence “go.vi.də” in error.

References

    1. Abeles M. Corticonics - Neural circuits of the cerebral cortex. Cambridge, UK: Cambridge University Press; 1991.
    1. Ackermann H. Cerebellar contributions to speech production and speech perception: psycholinguistic and neurobiological perspectives. Trends Neurosci. 2008;31(6):265–272. - PubMed
    1. Agam Y, Bullock D, Sekuler R. Imitating unfamiliar sequences of connected linear motions. Journal of Neurophysiology. 2005;94(4):2832–2843. - PubMed
    1. Alario FX, Ferrand L, Laganaro M, New B, Frauenfelder UH, Segui J. Predictors of picture naming speed. Behavior Research Methods, Instruments, and Computers. 2004;36:140–155. - PubMed
    1. Alario FX, Chainay H, Lehericy S, Cohen L. The role of the supplementary motor area (SMA) in word production. Brain Res. 2006;1076(1):129–143. - PubMed

Publication types