Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 1;27(5):3064-3079.
doi: 10.1093/cercor/bhx056.

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Affiliations

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Daniel Carey et al. Cereb Cortex. .

Abstract

Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants' vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST.

Keywords: FMRI; learning.; rtMRI; sensorimotor transformation; speech.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of experimental paradigm and analysis framework. Upper row. (1) Participants trained on imitating one native and one nonnative vowel in blocks; all 10 tokens from a single category (e.g., /i/ or /y/) were imitated in randomized order in a given block (stimuli F1 and F2 are plotted in mel space—see lower inset). (2) Training was followed by scanning, during which participants imitated the trained pair and a further untrained pair. Scans comprised 3 fMRI blocks (140 trials, ~12 min), each preceded by a pair of rtMRI blocks (40 trials, ~3 min). Data analyses (ad). (a) rtMRI data were analyzed with the Matlab toolbox of Kim et al. (2014), yielding measures of lip position per vowel (red trace on panels). (b) fMRI data were first analyzed with SPM, with contrasts specified for main effects (Imitation > rest; Listen preimitate > rest) (surface shown presents all imitation > rest second-level contrast, for illustrative purposes). Further contrasts were specified for each vowel > rest, for listen preimitation and imitation stages of the task. ROIs were defined with a Jackknifed “leave-one-out” procedure using the listen preimitate or imitation main effects (all vowels > rest). (c) rtMRI images frames were first averaged within a single trial (using the method of Scott et al. 2013). Images were then masked with the RSA toolbox, restricting FOV to the vocal tract. Masked images were cross-correlated on a trial-wise basis, creating three 40 × 40 RDMs (one per rtMRI block pair). Converting RDMs from correlation distance to z-score (with Fisher transform), each RDM was reduced to 4 × 4 matrix, and 4 × 4 matrices were averaged to give a single 4 × 4 matrix per subject. Single-subject 4 × 4 models were averaged to produce a full cohort 4 × 4 average model. Single-subject and full cohort models were used in searchlight analyses. (d) Schematic of the RSA searchlight procedure. Jackknifed ROIs constrained the searchlight analyses to regions active for imitation (all imitation > rest) and ST (all listen preimitate > rest). In each searchlight, the RDM pattern from the t-maps for the vowel conditions was correlated with the vocal tract image-derived model, the stimulus PSD acoustic-derived model, or the stimulus F1–F2 2D Euclidean distance derived model (see Materials and Methods).
Figure 2.
Figure 2.
Left: Example lip (red) position traces as measured from rtMRI images in a single subject. Right: Lip protrusion difference metrics per group (unrounded – rounded lip × co-ordinate). Positive values indicate relatively greater protrusion for the rounded than unrounded vowel; note that all means are significantly greater than 0 (all P < 0.005; see Results). See Results for description of statistical interaction.
Figure 3.
Figure 3.
Univariate 2 × 2 ANOVA results (factors: training, native/nonnative) for ST (blue) and imitation (green) fMRI data. (a) Native/nonnative 2 × 2 main effect results for ST (blue) and imitation (green), significant at cluster-corrected FDR level (q < 0.05). Bar plots display mean beta parameter estimates (adjusted response) for cluster peak voxels (peak co-ordinates in parentheses). Conditions: NT, native trained; NU, native untrained; NnT, nonnative trained; NnU, nonnative untrained. (b) Training main effect results for ST (blue), significant at P < 0.0001 (k = 30) (did not survive at cluster-level FDR for voxel-height threshold of P < 0.0015, k = 50; q > 0.05).
Figure 4.
Figure 4.
RSA searchlight results. (a) Vocal tract group average RDM model pattern correlates with fMRI activation patterns in bilateral somatomotor, left superior temporal, bilateral medial temporal and right cerebellar regions for ST. The stimulus acoustic-derived RDM pattern did not correlate robustly with fMRI t-map RDMs; tests of the correlation coefficients from both analyses showed significantly more robust correlations for the vocal tract model than the stimulus model (note that this overlapped with all voxels where significant vocal tract model and fMRI t-map correlations emerged; q < 0.05, FDR-corrected). (b) Vocal tract group average model correlates nonrobustly with fMRI activation patterns for imitation in left ventral M1/lateral Heschl's gyrus, and right lateral somatomotor cortex. Transparent underlays in (a) and (b) show the boundaries of the searchlight ROI volume—blue: ST ROI; green: imitation ROI. Scale bar minimum in (a) shows the equivalent uncorrected threshold at which voxel-height FDR correction (q < 0.05) is achieved; for consistency, the same scale bar range is used in (b), but note that (b) correlations are nonsignificant with FDR correction (q > 0.05).
Figure 5.
Figure 5.
RSA searchlight results using F1–F2 2D Euclidean distance RDM test model. The F1–F2 2D Euclidean distance model reveals correlations that overlap most of the regions that manifested significant searchlight correlations for the group average vocal tract model (see Fig. 4). Voxel-wise Wilcoxon signed-rank tests of the correlation maps derived from the vocal tract average model and the correlation maps from the F1–F2 2D Euclidean distance model, did not reveal any significant differences in robustness of the correlations across the 2 analyses (all FDR q > 0.05). All other parameters as per Figure 4.

Similar articles

Cited by

References

    1. Arsenault JS, Buchsbaum BR. 2015. Distributed neural representations of phonological features during speech perception. J Neurosci. 35(2):634–642. - PMC - PubMed
    1. Boersma P, Weenink D. 2016. Praat: doing phonetics by computer. Version 6.0.13.
    1. Bohland JW, Bullock D, Guenther FH. 2010. Neural representations and mechanisms for the performance of simple speech sequences. J Cogn Neurosci. 22(7):1504–1529. - PMC - PubMed
    1. Bouchard KE, Mesgarani N, Johnson K, Chang EF. 2013. Functional organization of human sensorimotor cortex for speech articulation. Nature. 495:327–332. - PMC - PubMed
    1. Bouchard KE, Conant DF, Anumanchipalli GK, Dichter B, Chaisanguanthum K, Johnson K, Chang EF. 2016. High-resolution, non-invasive imaging of upper vocal tract articulators compatible with human brain recordings. PLoS One. doi:0.1371/journal.pone.0151327. - PMC - PubMed

Publication types