Review

. 2016 Jan;43(1):28-44.

doi: 10.1002/jmri.24997. Epub 2015 Jul 14.

Recommendations for real-time speech MRI

Sajan Goud Lingala¹, Brad P Sutton², Marc E Miquel³, Krishna S Nayak¹

Affiliations

¹ University of Southern California, Los Angeles, California, USA.
² University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA.
³ Barts Health NHS Trust, London, UK.

PMID: 26174802
PMCID: PMC5079859
DOI: 10.1002/jmri.24997

Review

Recommendations for real-time speech MRI

Sajan Goud Lingala et al. J Magn Reson Imaging. 2016 Jan.

. 2016 Jan;43(1):28-44.

doi: 10.1002/jmri.24997. Epub 2015 Jul 14.

Authors

Sajan Goud Lingala¹, Brad P Sutton², Marc E Miquel³, Krishna S Nayak¹

Affiliations

¹ University of Southern California, Los Angeles, California, USA.
² University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA.
³ Barts Health NHS Trust, London, UK.

PMID: 26174802
PMCID: PMC5079859
DOI: 10.1002/jmri.24997

Abstract

Real-time magnetic resonance imaging (RT-MRI) is being increasingly used for speech and vocal production research studies. Several imaging protocols have emerged based on advances in RT-MRI acquisition, reconstruction, and audio-processing methods. This review summarizes the state-of-the-art, discusses technical considerations, and provides specific guidance for new groups entering this field. We provide recommendations for performing RT-MRI of the upper airway. This is a consensus statement stemming from the ISMRM-endorsed Speech MRI summit held in Los Angeles, February 2014. A major unmet need identified at the summit was the need for consensus on protocols that can be easily adapted by researchers equipped with conventional MRI systems. To this end, we provide a discussion of tradeoffs in RT-MRI in terms of acquisition requirements, a priori assumptions, artifacts, computational load, and performance for different speech tasks. We provide four recommended protocols and identify appropriate acquisition and reconstruction tools. We list pointers to open-source software that facilitate implementation. We conclude by discussing current open challenges in the methodological aspects of RT-MRI of speech.

Keywords: rapid imaging; real time MRI; recommendations; speech imaging.

PubMed Disclaimer

Figures

**FIGURE 1**
Spatiotemporal resolution requirements for various speech tasks. The placement of these “zones” reflects the current consensus opinion among speech imaging researchers in attendance at the 2014 Speech MRI Summit. Boundaries are approximate due to the lack of gold-standard imaging techniques, and are being refined through the more widespread adoption of noninvasive techniques such as RT-MRI. The plots depict the spatiotemporal resolutions that could be realized by the four recommended protocols in Table 1. Note that each of the protocols have tradeoffs beyond spatial and temporal resolutions, which require careful consideration and are discussed in detail in the text. For instance, SNR losses at high spatial resolutions (all protocols), off-resonance artifacts (Protocol 2), long acquisition times (Protocol 4).

**FIGURE 2**
Comparison of image quality from different receiver coils in one adult subject. Images were acquired using Protocol 2, spiral gradient echo imaging; 2.4 × 2.4 mm² spatial resolution. Compared to (left) a standard head coil (middle, right) custom airway coils provide improved SNR in all vocal tract articulators including the lips, velum, epiglottis, and tongue. The SNR gain ranges from 50% to 1600%, and is greatest for articulators closest to the coil elements (the lips in this case); also see. Custom array coils can also provide improved parallel imaging performance (not shown).

**FIGURE 3**
Comparison of GRE and SSFP image quality at 3T. Note that the SSFP images have higher SNR but are sensitive to off-resonance and its associated banding artifacts. In this example, the banding artifacts manifested as signal voids on the velum and portions of tongue (see arrows).

**FIGURE 4**
Example images of velopharyngeal closure obtained at 1.5T using protocol 1. (**a,b**) The placement of the shim volume (green box) centered around the velum (yellow arrow) for a 39-year-old male volunteer. Despite careful shimming images can degrade substantially in some subjects during the speech sample acquisition as demonstrated by images (**c,d**) in a 28-year-old patient with repaired cleft lip and palate. The velum (yellow arrow) almost completely disappears in the elevated position (d). Although h-EPI sequences have a lower SNR (**e,f**), it is possible to track the velum (yellow arrow) throughout the speech sample.

**FIGURE 5**
Simulated aliasing artifacts for spiral, radial, and Cartesian acquisitions. Simulations are based on 2.4 × 2.4 mm² resolution with 20 × 20 cm² FOV of a mid-sagittal slice, and are realized by retrospective subsampling of a reconstructed image formed from a 34 interleave spiral dataset, using different trajectories. All the trajectories were simulated with realistic TRs that were obtained from a 1.5T GE scanner with modern gradient specifications (40 mT/m amplitude and 150 mT/m/ms slew rate). The left, middle, and right columns respectively correspond to Protocols 2, 3, 1; all based on gradient echo imaging. In comparison to Cartesian aliases, spiral, and radial aliasing artifacts are less coherent and less visually detrimental to visualization of the vocal tract air space and articulators.

**FIGURE 6**
Effect of spiral readout duration on image quality. Each image represents a single frame captured when the subject sustained the nasal sound /n/. The spiral readout durations were (a) 2.520 msec, (b) 3.584 msec, (c) 4.576 msec, (d) 6.368 msec, and (e) 10.560 msec. Identical shim values were used for (a–e). Blurring artifacts are most prominent near air-tissue interfaces of the tongue, lips, and velum. (Figure courtesy: Y-C. Kim, Samsung Research, Seoul, South Korea.)

**FIGURE 7**
Off-resonance blurring in spiral MRI can be mitigated by real-time adjustment of the center frequency. Note that air-tissue interfaces (yellow arrows) are the locations of largest variation in resonant frequency.

**FIGURE 8**
Redundancy in dynamic speech data: A reconstructed time series with spatial resolution of 2.4 mm², and temporal resolution of 12 msec/frame, obtained from an ~6.5-fold accelerated reconstruction is considered for demonstration. (a) A representative spatial frame of the time series, and (b) the time profile along the arrow specified in (a). (c) The temporal finite difference of the time series, which is obtained by taking the pixel-wise finite difference of the dynamic data along time. The sparse representation of the temporal finite difference transform was exploited to realize the accelerated reconstruction in (a,b). The dynamic data also have sparse representations in the spatial-spectral domain (d); this representation is obtained by evaluating the Fourier transform of the data along the temporal dimension. The dynamic data can be rearranged as a Casorati matrix (vectorized and represented column-wise). A singular value decomposition of this matrix reveals the redundancies amongst the pixel time profiles, ie, fast decay of the singular values as depicted in the (e), which is exploited in the low rank and partially separable models.

**FIGURE 9**
Subsampling in combination with efficient constrained reconstruction allows for improved time resolution in comparison to Nyquist sampling; a spiral-based acquisition as depicted in Protocol 2 is used. Images reconstructed using (a) Nyquist sampling and online gridding reconstruction result in time resolution of 78 msec/frame, while (b) sub-Nyquist sampling and online gridding reconstruction allows for significantly improved time resolution, at the expense of aliasing artifacts. Offline constrained reconstruction addresses this tradeoff by resolving the aliasing at the native time resolution of 12 msec/frame. Note the apparent advantage of the increase in time resolution of (c) vs. (a) in terms of crispness along the time axis. The task was to repeatedly count numbers “one-two-three-four-five” at a rapid pace (the subject spoke ~4 times faster than his normal speech pace).

**FIGURE 10**
Example of acquisition with Protocol 3: The raw data were acquired with a radial gradient echo sequence with golden angle rotation. Images were reconstructed using a conjugate gradient-SENSE method that applies regularization using a spatial total variation (TV) operator. Twenty-five subsequent echoes were used to calculate low resolution sensitivity maps for each coil and a complete image with a spatial resolution of 1.8 × 1.8 × 10 mm³, leading to a native temporal resolution of 55 msec that was further accelerated to 40 msec by applying a sliding window. (Figure courtesy: M. Burdumy, University Medical Center Freiburg, Germany.)

**FIGURE 11**
Strip plot through a single slice at the tip of the tongue (indicated by the dotted line) during an acquisition with Protocol 4, using spatial-spectral regularization and the PS model for reconstruction. The acquisition acquired eight simultaneous sagittal slices at 102 frames per second while the subject uttered a simple speech sample.

See this image and copyright information in PMC

Cited by

Realistic Dynamic Numerical Phantom for MRI of the Upper Vocal Tract.
Martin J, Ruthven M, Boubertakh R, Miquel ME. Martin J, et al. J Imaging. 2020 Aug 27;6(9):86. doi: 10.3390/jimaging6090086. J Imaging. 2020. PMID: 34460743 Free PMC article.
Real-Time Magnetic Resonance Imaging.
Nayak KS, Lim Y, Campbell-Washburn AE, Steeden J. Nayak KS, et al. J Magn Reson Imaging. 2022 Jan;55(1):81-99. doi: 10.1002/jmri.27411. Epub 2020 Dec 9. J Magn Reson Imaging. 2022. PMID: 33295674 Free PMC article. Review.
Sub-millisecond 2D MRI of the vocal fold oscillation using single-point imaging with rapid encoding.
Fischer J, Özen AC, Ilbey S, Traser L, Echternach M, Richter B, Bock M. Fischer J, et al. MAGMA. 2022 Apr;35(2):301-310. doi: 10.1007/s10334-021-00959-4. Epub 2021 Sep 20. MAGMA. 2022. PMID: 34542771 Free PMC article.
Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech.
Lingala SG, Zhu Y, Lim Y, Toutios A, Ji Y, Lo WC, Seiberlich N, Narayanan S, Nayak KS. Lingala SG, et al. Magn Reson Med. 2017 Dec;78(6):2275-2282. doi: 10.1002/mrm.26611. Epub 2017 Feb 10. Magn Reson Med. 2017. PMID: 28185301 Free PMC article.
Automatic Multiple Articulator Segmentation in Dynamic Speech MRI Using a Protocol Adaptive Stacked Transfer Learning U-NET Model.
Erattakulangara S, Kelat K, Meyer D, Priya S, Lingala SG. Erattakulangara S, et al. Bioengineering (Basel). 2023 May 22;10(5):623. doi: 10.3390/bioengineering10050623. Bioengineering (Basel). 2023. PMID: 37237693 Free PMC article.

See all "Cited by" articles

References

1. Scott AD, Wylezinska M, Birch MJ, Miquel ME. Speech MRI: morphology and function. Phys Med. 2014;30:604–618. - PubMed
1. Bresch E, Kim Y-C, Nayak K, Byrd D, Narayanan S. Seeing speech: capturing vocal tract shaping using real-time magnetic resonance imaging. IEEE Signal Proc Mag. 2008;25:123–132.
1. Demolin D, Hassid S, Metens T, Soquet A. Real-time MRI and articulatory coordination in speech. Comptes Rendus Biol. 2002;325:547–556. - PubMed
1. Honda K, Takemoto H, Kitamura T, Fujita S, Takano S. Exploring human speech production mechanisms by MRI. IEICE Trans Inform Syst. 2004;87:1050–1058.
1. NessAiver MS, Stone M, Parthasarathy V, Kahana Y, Paritsky A. Recording high quality speech during tagged cine-MRI studies using a fiber optic microphone. J Magn Reson Imaging. 2006;23:92–97. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 DC007124/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Recommendations for real-time speech MRI

Affiliations

Recommendations for real-time speech MRI

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical