Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;73(5):1820-32.
doi: 10.1002/mrm.25302. Epub 2014 Jun 9.

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Affiliations

High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Maojing Fu et al. Magn Reson Med. 2015 May.

Abstract

Purpose: To enable dynamic speech imaging with high spatiotemporal resolution and full-vocal-tract spatial coverage, leveraging recent advances in sparse sampling.

Methods: An imaging method is developed to enable high-speed dynamic speech imaging exploiting low-rank and sparsity of the dynamic images of articulatory motion during speech. The proposed method includes: (a) a novel data acquisition strategy that collects spiral navigators with high temporal frame rate and (b) an image reconstruction method that derives temporal subspaces from navigators and reconstructs high-resolution images from sparsely sampled data with joint low-rank and sparsity constraints.

Results: The proposed method has been systematically evaluated and validated through several dynamic speech experiments. A nominal imaging speed of 102 frames per second (fps) was achieved for a single-slice imaging protocol with a spatial resolution of 2.2 × 2.2 × 6.5 mm(3) . An eight-slice imaging protocol covering the entire vocal tract achieved a nominal imaging speed of 12.8 fps with the identical spatial resolution. The effectiveness of the proposed method and its practical utility was also demonstrated in a phonetic investigation.

Conclusion: High spatiotemporal resolution with full-vocal-tract spatial coverage can be achieved for dynamic speech imaging experiments with low-rank and sparsity constraints.

Keywords: dynamic speech imaging; low-rank approximation; partial separability modeling; sparsity; spiral navigation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A simplified pulse sequence diagram for the proposed PS model-based data acquisition strategy with illustration of (k, t)-space sampling patterns. The navigator data set is acquired using a spiral trajectory. The imaging data set is acquired using a Cartesian trajectory with random phase encoding.
Figure 2
Figure 2
Mid-sagittal reconstructions of the upper vocal tract during the production of /loo/-/lee/-/la/-/za/-/na/-/za/ syllables. The directions of the movement of the mid-tongue are indicated with arrows during (a) /l/ of the /lee/ syllable; (b) /l/ of the /loo/ syllable; (c) /l/ of the /la/ syllable; (d) /a/ of the /za/ syllable.
Figure 3
Figure 3
Multi-slice mid-sagittal reconstructions covering the entire vocal tract. The left column shows the positions and orientations of the resliced oblique-coronal planes. The middle and right columns show resliced images. 3D articulatory motion is observed during the production of /loo/-/lee/-/la/-/za/-/na/-/za/ syllables: (a) an oblique plane across the lower incisor teeth and the alveolar ridge; (b) an oblique plane across the body of the lower jaw and the velopharyngeal closure point.
Figure 4
Figure 4
Strip plots of the production of /za/-/na/-/za/ syllables at fast, medium and slow speaking paces. These strip plot demonstrate the temporal dynamics along reference lines that are taken as: (a) a vertical line across the upper and lower lip; (b) a vertical line across the roof of the mouth and the mid-tongue; (c) a horizontal line across the bottom of the upper lip and the upper pharyngeal wall.
Figure 5
Figure 5
Mid-sagittal reconstructions and strip plots of the production of a reading passage that contains no repetitions of words or phrases: (a) representative articulatory motion at four different time instances; (b) temporal profile taken along a vertical strip across the tongue tip; (c) temporal profile taken along a vertical strip across the mid-tongue; (d) temporal profile taken along a horizontal strip across the velum.
Figure 6
Figure 6
Velar movement within a 18 pixel × 18 pixel region of interest as illustrated in (a). (b) An air passage is formed between the velum body and the pharyngeal wall during the production of the nasal vowel /α̃/. (c) The relaxed velum creates maximum velopharyngeal opening during the breathing period. (d) The velum seals the velopharyngeal port during the production of the plosive /t/.
Figure 7
Figure 7
Comparison of oblique coronal reconstructions for nasal vowels /α̃/, /ε̃/ and /ɔ̃/. (a) /α̃/ has the largest distance between the median portion of the tongue and the palate. (b) /α̃/ has largest velopharyngeal opening size. (c) /α̃/ has the smallest opening between the root of the tongue and the pharynx. (d) Three vowels have nearly identical opening between the epiglottis and the pharyngeal wall.
Figure 8
Figure 8
Linguistic analysis integrating imaging information with acoustic properties. Colored rectangles represent temporal windows corresponding to the production of different sounds. (a) Illustration of a 5 pixel × 5 pixel region of interest where the average image intensity (API) in (b) is calculated. (b) Temporal evolution of API in the square region of (a). (c) The recorded acoustic signal. (d) The spectrogram of the acoustic signal calculated with an window width of 10 ms.
Figure 9
Figure 9
Reconstruction of an experiment data set using different model orders. The first column shows representative mid-sagittal reconstructions with model orders: a) L = 20, b) L = 40, c) L = 80 and d) L = 120. The second column shows corresponding strip plots taken from a vertical line across the roof of the mouth and the mid-tongue. Variations of spatiotemporal dynamics on the strip plots are indicated by arrows.

References

    1. Ventura SMR, Freitas DRS, Tavares JMR. Toward dynamic magnetic resonance imaging of the vocal tract during speech production. J Voice. 2011;25:511–518. - PubMed
    1. Ettema SL, Kuehn DP, Perlman AL, Alperin N. Magnetic resonance imaging of the levator veli palatini muscle during speech. Cleft Palate Craniofac J. 2002;39:130–144. - PubMed
    1. Echternach M, Markl M, Richter B. Dynamic real-time magnetic resonance imaging for the analysis of voice physiology. Curr Opin Otolaryngol Head Neck Surg. 2012;20:450–457. - PubMed
    1. Sundberg J. Articulatory configuration and pitch in a classically trained soprano singer. J Voice. 2009;23:546–551. - PubMed
    1. Proctor M, Bresch E, Byrd D, Nayak K, Narayanan S. Paralinguistic mechanisms of production in human beatboxing: A real-time magnetic resonance imaging study. J Acoust Soc Am. 2013;133:1043–1054. - PMC - PubMed

Publication types