. 2017 May 1;27(5):3064-3079.

doi: 10.1093/cercor/bhx056.

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Daniel Carey^{1

2

3}, Marc E Miquel^{4

5}, Bronwen G Evans⁶, Patti Adank⁶, Carolyn McGettigan^{1

2

7}

Affiliations

¹ Department of Psychology, Royal Holloway, University of London, London TW20 0EX, UK.
² Combined Universities Brain Imaging Centre, Royal Holloway, University of London, London TW20 0EX, UK.
³ The Irish Longitudinal Study on Ageing (TILDA), Department of Medical Gerontology, Trinity College Dublin, Dublin, Ireland.
⁴ William Harvey Research Institute, Queen Mary, University of London, London EC1M 6BQ, UK.
⁵ Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, UK.
⁶ Department of Speech, Hearing & Phonetic Sciences, University College London, London WC1E 6BT, UK.
⁷ Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, UK.

PMID: 28334401
PMCID: PMC5939209
DOI: 10.1093/cercor/bhx056

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Daniel Carey et al. Cereb Cortex. 2017.

. 2017 May 1;27(5):3064-3079.

doi: 10.1093/cercor/bhx056.

Authors

Daniel Carey^{1

2

3}, Marc E Miquel^{4

5}, Bronwen G Evans⁶, Patti Adank⁶, Carolyn McGettigan^{1

2

7}

Affiliations

¹ Department of Psychology, Royal Holloway, University of London, London TW20 0EX, UK.
² Combined Universities Brain Imaging Centre, Royal Holloway, University of London, London TW20 0EX, UK.
³ The Irish Longitudinal Study on Ageing (TILDA), Department of Medical Gerontology, Trinity College Dublin, Dublin, Ireland.
⁴ William Harvey Research Institute, Queen Mary, University of London, London EC1M 6BQ, UK.
⁵ Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, UK.
⁶ Department of Speech, Hearing & Phonetic Sciences, University College London, London WC1E 6BT, UK.
⁷ Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, UK.

PMID: 28334401
PMCID: PMC5939209
DOI: 10.1093/cercor/bhx056

Abstract

Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants' vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST.

Keywords: FMRI; learning.; rtMRI; sensorimotor transformation; speech.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of experimental paradigm and analysis framework. *Upper row.* (1) Participants trained on imitating one native and one nonnative vowel in blocks; all 10 tokens from a single category (e.g., /i/ or /y/) were imitated in randomized order in a given block (stimuli F1 and F2 are plotted in mel space—see lower inset). (2) Training was followed by scanning, during which participants imitated the trained pair and a further untrained pair. Scans comprised 3 fMRI blocks (140 trials, ~12 min), each preceded by a pair of rtMRI blocks (40 trials, ~3 min). *Data analyses* (a–d). (a) rtMRI data were analyzed with the Matlab toolbox of Kim et al. (2014), yielding measures of lip position per vowel (red trace on panels). (b) fMRI data were first analyzed with SPM, with contrasts specified for main effects (Imitation > rest; Listen preimitate > rest) (surface shown presents all imitation > rest second-level contrast, for illustrative purposes). Further contrasts were specified for each vowel > rest, for listen preimitation and imitation stages of the task. ROIs were defined with a Jackknifed “leave-one-out” procedure using the listen preimitate or imitation main effects (all vowels > rest). (c) rtMRI images frames were first averaged within a single trial (using the method of Scott et al. 2013). Images were then masked with the RSA toolbox, restricting FOV to the vocal tract. Masked images were cross-correlated on a trial-wise basis, creating three 40 × 40 RDMs (one per rtMRI block pair). Converting RDMs from correlation distance to z-score (with Fisher transform), each RDM was reduced to 4 × 4 matrix, and 4 × 4 matrices were averaged to give a single 4 × 4 matrix per subject. Single-subject 4 × 4 models were averaged to produce a full cohort 4 × 4 average model. Single-subject and full cohort models were used in searchlight analyses. (d) Schematic of the RSA searchlight procedure. Jackknifed ROIs constrained the searchlight analyses to regions active for imitation (all imitation > rest) and ST (all listen preimitate > rest). In each searchlight, the RDM pattern from the t-maps for the vowel conditions was correlated with the vocal tract image-derived model, the stimulus PSD acoustic-derived model, or the stimulus F1–F2 2D Euclidean distance derived model (see Materials and Methods).

**Figure 2.**
Left: Example lip (red) position traces as measured from rtMRI images in a single subject. Right: Lip protrusion difference metrics per group (unrounded – rounded lip × co-ordinate). Positive values indicate relatively greater protrusion for the rounded than unrounded vowel; note that all means are significantly greater than 0 (all P < 0.005; see Results). See Results for description of statistical interaction.

**Figure 3.**
Univariate 2 × 2 ANOVA results (factors: training, native/nonnative) for ST (blue) and imitation (green) fMRI data. (a) Native/nonnative 2 × 2 main effect results for ST (blue) and imitation (green), significant at cluster-corrected FDR level (q < 0.05). Bar plots display mean beta parameter estimates (adjusted response) for cluster peak voxels (peak co-ordinates in parentheses). Conditions: NT, native trained; NU, native untrained; NnT, nonnative trained; NnU, nonnative untrained. (b) Training main effect results for ST (blue), significant at P < 0.0001 (k = 30) (did not survive at cluster-level FDR for voxel-height threshold of P < 0.0015, k = 50; q > 0.05).

**Figure 4.**
RSA searchlight results. (a) Vocal tract group average RDM model pattern correlates with fMRI activation patterns in bilateral somatomotor, left superior temporal, bilateral medial temporal and right cerebellar regions for ST. The stimulus acoustic-derived RDM pattern did not correlate robustly with fMRI t-map RDMs; tests of the correlation coefficients from both analyses showed significantly more robust correlations for the vocal tract model than the stimulus model (note that this overlapped with all voxels where significant vocal tract model and fMRI t-map correlations emerged; q < 0.05, FDR-corrected). (b) Vocal tract group average model correlates nonrobustly with fMRI activation patterns for imitation in left ventral M1/lateral Heschl's gyrus, and right lateral somatomotor cortex. Transparent underlays in (a) and (b) show the boundaries of the searchlight ROI volume—blue: ST ROI; green: imitation ROI. Scale bar minimum in (a) shows the equivalent uncorrected threshold at which voxel-height FDR correction (q < 0.05) is achieved; for consistency, the same scale bar range is used in (b), but note that (b) correlations are nonsignificant with FDR correction (q > 0.05).

**Figure 5.**
RSA searchlight results using F1–F2 2D Euclidean distance RDM test model. The F1–F2 2D Euclidean distance model reveals correlations that overlap most of the regions that manifested significant searchlight correlations for the group average vocal tract model (see Fig. 4). Voxel-wise Wilcoxon signed-rank tests of the correlation maps derived from the vocal tract average model and the correlation maps from the F1–F2 2D Euclidean distance model, did not reveal any significant differences in robustness of the correlations across the 2 analyses (all FDR q > 0.05). All other parameters as per Figure 4.

See this image and copyright information in PMC

Cited by

Altered Functional Connectivity and Brain Network Property in Pregnant Women With Cleft Fetuses.
Li Z, Li C, Liang Y, Wang K, Zhang W, Chen R, Wu Q, Zhang X. Li Z, et al. Front Psychol. 2019 Oct 9;10:2235. doi: 10.3389/fpsyg.2019.02235. eCollection 2019. Front Psychol. 2019. PMID: 31649585 Free PMC article.
Realistic Dynamic Numerical Phantom for MRI of the Upper Vocal Tract.
Martin J, Ruthven M, Boubertakh R, Miquel ME. Martin J, et al. J Imaging. 2020 Aug 27;6(9):86. doi: 10.3390/jimaging6090086. J Imaging. 2020. PMID: 34460743 Free PMC article.
Decoding kinematic information from beta-band motor rhythms of speech motor cortex: a methodological/analytic approach using concurrent speech movement tracking and magnetoencephalography.
Anastasopoulou I, Cheyne DO, van Lieshout P, Johnson BW. Anastasopoulou I, et al. Front Hum Neurosci. 2024 Apr 5;18:1305058. doi: 10.3389/fnhum.2024.1305058. eCollection 2024. Front Hum Neurosci. 2024. PMID: 38646159 Free PMC article.
Neural representation of sensorimotor features in language-motor areas during auditory and visual perception.
Zheng Y, Zhang J, Yang Y, Xu M. Zheng Y, et al. Commun Biol. 2025 Jan 11;8(1):41. doi: 10.1038/s42003-025-07466-5. Commun Biol. 2025. PMID: 39799186 Free PMC article.
Poor neuro-motor tuning of the human larynx: a comparison of sung and whistled pitch imitation.
Belyk M, Johnson JF, Kotz SA. Belyk M, et al. R Soc Open Sci. 2018 Apr 18;5(4):171544. doi: 10.1098/rsos.171544. eCollection 2018 Apr. R Soc Open Sci. 2018. PMID: 29765635 Free PMC article.

See all "Cited by" articles

References

1. Arsenault JS, Buchsbaum BR. 2015. Distributed neural representations of phonological features during speech perception. J Neurosci. 35(2):634–642. - PMC - PubMed
1. Boersma P, Weenink D. 2016. Praat: doing phonetics by computer. Version 6.0.13.
1. Bohland JW, Bullock D, Guenther FH. 2010. Neural representations and mechanisms for the performance of simple speech sequences. J Cogn Neurosci. 22(7):1504–1529. - PMC - PubMed
1. Bouchard KE, Mesgarani N, Johnson K, Chang EF. 2013. Functional organization of human sensorimotor cortex for speech articulation. Nature. 495:327–332. - PMC - PubMed
1. Bouchard KE, Conant DF, Anumanchipalli GK, Dichter B, Chaisanguanthum K, Johnson K, Chang EF. 2016. High-resolution, non-invasive imaging of upper vocal tract articulators compatible with human brain recordings. PLoS One. doi:0.1371/journal.pone.0151327. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Affiliations

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources