. 2015 May;73(5):1820-32.

doi: 10.1002/mrm.25302. Epub 2014 Jun 9.

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Maojing Fu¹, Bo Zhao, Christopher Carignan, Ryan K Shosted, Jamie L Perry, David P Kuehn, Zhi-Pei Liang, Bradley P Sutton

Affiliations

Affiliation

¹ Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA; Beckman Institute of Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

PMID: 24912452
PMCID: PMC4261062
DOI: 10.1002/mrm.25302

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Maojing Fu et al. Magn Reson Med. 2015 May.

. 2015 May;73(5):1820-32.

doi: 10.1002/mrm.25302. Epub 2014 Jun 9.

Authors

Maojing Fu¹, Bo Zhao, Christopher Carignan, Ryan K Shosted, Jamie L Perry, David P Kuehn, Zhi-Pei Liang, Bradley P Sutton

Affiliation

¹ Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA; Beckman Institute of Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

PMID: 24912452
PMCID: PMC4261062
DOI: 10.1002/mrm.25302

Abstract

Purpose: To enable dynamic speech imaging with high spatiotemporal resolution and full-vocal-tract spatial coverage, leveraging recent advances in sparse sampling.

Methods: An imaging method is developed to enable high-speed dynamic speech imaging exploiting low-rank and sparsity of the dynamic images of articulatory motion during speech. The proposed method includes: (a) a novel data acquisition strategy that collects spiral navigators with high temporal frame rate and (b) an image reconstruction method that derives temporal subspaces from navigators and reconstructs high-resolution images from sparsely sampled data with joint low-rank and sparsity constraints.

Results: The proposed method has been systematically evaluated and validated through several dynamic speech experiments. A nominal imaging speed of 102 frames per second (fps) was achieved for a single-slice imaging protocol with a spatial resolution of 2.2 × 2.2 × 6.5 mm(3) . An eight-slice imaging protocol covering the entire vocal tract achieved a nominal imaging speed of 12.8 fps with the identical spatial resolution. The effectiveness of the proposed method and its practical utility was also demonstrated in a phonetic investigation.

Conclusion: High spatiotemporal resolution with full-vocal-tract spatial coverage can be achieved for dynamic speech imaging experiments with low-rank and sparsity constraints.

Keywords: dynamic speech imaging; low-rank approximation; partial separability modeling; sparsity; spiral navigation.

PubMed Disclaimer

Figures

**Figure 1**
A simplified pulse sequence diagram for the proposed PS model-based data acquisition strategy with illustration of (k, t)-space sampling patterns. The navigator data set is acquired using a spiral trajectory. The imaging data set is acquired using a Cartesian trajectory with random phase encoding.

**Figure 2**
Mid-sagittal reconstructions of the upper vocal tract during the production of /loo/-/lee/-/la/-/za/-/na/-/za/ syllables. The directions of the movement of the mid-tongue are indicated with arrows during (a) /l/ of the /lee/ syllable; (b) /l/ of the /loo/ syllable; (c) /l/ of the /la/ syllable; (d) /a/ of the /za/ syllable.

**Figure 3**
Multi-slice mid-sagittal reconstructions covering the entire vocal tract. The left column shows the positions and orientations of the resliced oblique-coronal planes. The middle and right columns show resliced images. 3D articulatory motion is observed during the production of /loo/-/lee/-/la/-/za/-/na/-/za/ syllables: (a) an oblique plane across the lower incisor teeth and the alveolar ridge; (b) an oblique plane across the body of the lower jaw and the velopharyngeal closure point.

**Figure 4**
Strip plots of the production of /za/-/na/-/za/ syllables at fast, medium and slow speaking paces. These strip plot demonstrate the temporal dynamics along reference lines that are taken as: (a) a vertical line across the upper and lower lip; (b) a vertical line across the roof of the mouth and the mid-tongue; (c) a horizontal line across the bottom of the upper lip and the upper pharyngeal wall.

**Figure 5**
Mid-sagittal reconstructions and strip plots of the production of a reading passage that contains no repetitions of words or phrases: (a) representative articulatory motion at four different time instances; (b) temporal profile taken along a vertical strip across the tongue tip; (c) temporal profile taken along a vertical strip across the mid-tongue; (d) temporal profile taken along a horizontal strip across the velum.

**Figure 6**
Velar movement within a 18 pixel × 18 pixel region of interest as illustrated in (a). (b) An air passage is formed between the velum body and the pharyngeal wall during the production of the nasal vowel /α̃/. (c) The relaxed velum creates maximum velopharyngeal opening during the breathing period. (d) The velum seals the velopharyngeal port during the production of the plosive /t/.

**Figure 7**
Comparison of oblique coronal reconstructions for nasal vowels /α̃/, /ε̃/ and /ɔ̃/. (a) /α̃/ has the largest distance between the median portion of the tongue and the palate. (b) /α̃/ has largest velopharyngeal opening size. (c) /α̃/ has the smallest opening between the root of the tongue and the pharynx. (d) Three vowels have nearly identical opening between the epiglottis and the pharyngeal wall.

**Figure 8**
Linguistic analysis integrating imaging information with acoustic properties. Colored rectangles represent temporal windows corresponding to the production of different sounds. (a) Illustration of a 5 pixel × 5 pixel region of interest where the average image intensity (API) in (b) is calculated. (b) Temporal evolution of API in the square region of (a). (c) The recorded acoustic signal. (d) The spectrogram of the acoustic signal calculated with an window width of 10 ms.

**Figure 9**
Reconstruction of an experiment data set using different model orders. The first column shows representative mid-sagittal reconstructions with model orders: a) L = 20, b) L = 40, c) L = 80 and d) L = 120. The second column shows corresponding strip plots taken from a vertical line across the roof of the mouth and the mid-tongue. Variations of spatiotemporal dynamics on the strip plots are indicated by arrows.

See this image and copyright information in PMC

References

1. Ventura SMR, Freitas DRS, Tavares JMR. Toward dynamic magnetic resonance imaging of the vocal tract during speech production. J Voice. 2011;25:511–518. - PubMed
1. Ettema SL, Kuehn DP, Perlman AL, Alperin N. Magnetic resonance imaging of the levator veli palatini muscle during speech. Cleft Palate Craniofac J. 2002;39:130–144. - PubMed
1. Echternach M, Markl M, Richter B. Dynamic real-time magnetic resonance imaging for the analysis of voice physiology. Curr Opin Otolaryngol Head Neck Surg. 2012;20:450–457. - PubMed
1. Sundberg J. Articulatory configuration and pitch in a classically trained soprano singer. J Voice. 2009;23:546–551. - PubMed
1. Proctor M, Bresch E, Byrd D, Nayak K, Narayanan S. Paralinguistic mechanisms of production in human beatboxing: A real-time magnetic resonance imaging study. J Acoust Soc Am. 2013;133:1043–1054. - PMC - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Affiliation

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous