Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov 15;12(11):2811.
doi: 10.3390/diagnostics12112811.

Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review

Affiliations
Review

Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review

Khalid Al-Hammuri et al. Diagnostics (Basel). .

Abstract

Lingual ultrasound imaging is essential in linguistic research and speech recognition. It has been used widely in different applications as visual feedback to enhance language learning for non-native speakers, study speech-related disorders and remediation, articulation research and analysis, swallowing study, tongue 3D modelling, and silent speech interface. This article provides a comparative analysis and review based on quantitative and qualitative criteria of the two main streams of tongue contour segmentation from ultrasound images. The first stream utilizes traditional computer vision and image processing algorithms for tongue segmentation. The second stream uses machine and deep learning algorithms for tongue segmentation. The results show that tongue tracking using machine learning-based techniques is superior to traditional techniques, considering the performance and algorithm generalization ability. Meanwhile, traditional techniques are helpful for implementing interactive image segmentation to extract valuable features during training and postprocessing. We recommend using a hybrid approach to combine machine learning and traditional techniques to implement a real-time tongue segmentation tool.

Keywords: computer vision; image segmentation; lingual ultrasound; machine learning; medical imaging analysis; tongue contour tracking.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of ultrasound probe placement beneath the chin. The ultrasound wave is shown in a black arc generated from the acoustic probe and propagated in the direction of the tongue. The effect of the hyoid and mandible bones is blocking part of the ultrasound wave, as shown in a black colour. The head and oral cavity picture was modified from the original picture for the case, courtesy of Associate Professor Frank Gaillard, Radiopaedia.org, rID: 35836, [86].
Figure 2
Figure 2
Ultrasound image of the tongue showing the tongue tip and root in the sagittal plane. The ultrasound probe on the bottom and the shadowing effect of the mandible and hyoid bone are visualized. The copyright for this ultrasound picture belongs to the author of this article, Khalid Al-hammuri [5].
Figure 3
Figure 3
Ultrasound image acquisition system used in speech analysis. The system is also configured with a microphone and head-transducer stability system. The copyright for the ultrasound and head-transducer support system picture belongs to the author of this article, Khalid Al-hammuri [5].
Figure 4
Figure 4
Shape-based evaluation measure. Point (A) is on the dorsal tongue part, point (B) is the point on the tongue tip, point (C) is the apex. Point (D) is the projection of point (C) on the (AB) line. The copyright for this ultrasound picture belongs to the author of this article, Khalid Al-hammuri [5].
Figure 5
Figure 5
K-fold cross-validation process. (A) The K iterations of the cross-validation. (B) The training fold data and labels. (C) Evaluating model performance during the validation fold data stage.
Figure 6
Figure 6
The process of labelling ultrasound images and extracting tongue contour using a deep belief neural network. All labels from (AD) are horizontally ordered. (A) Ultrasound image before processing. (B) Manually labelled ground truth data. (C) Extracted features from ultrasound images using a translational deep belief neural network. (A) Extracted tongue contour overlaid on the original ultrasound image [104].
Figure 7
Figure 7
Quality evaluation matrix. Usability, image quality, and shape consistency are scored on a 0–5 scale (0 is the lowest and 5 is the highest). The final quality score is shown on a percentile scale and a satisfaction rate from low to high.
Figure 8
Figure 8
Bar chart for the total qualitative score of tongue image segmentation categories. The Y-axis is the qualitative score probability, and the X-axis is the quality score category for each image segmentation technique.

Similar articles

Cited by

References

    1. Palmatier R.W., Houston M.B., Hulland J. Review articles: Purpose, process, and structure. J. Acad. Mark. Sci. 2018;46:1–5. doi: 10.1007/s11747-017-0563-4. - DOI
    1. Li M., Kambhamettu C., Stone M. Automatic contour tracking in ultrasound images. Clin. Linguist. Phon. 2005;19:545–554. doi: 10.1080/02699200500113616. - DOI - PubMed
    1. Tang L., Bressmann T., Hamarneh G. Tongue contour tracking in dynamic ultrasound via higher-order MRFs and efficient fusion moves. Med. Image Anal. 2012;16:1503–1520. doi: 10.1016/j.media.2012.07.001. - DOI - PubMed
    1. Laporte C., Ménard L. Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech. Med. Image Anal. 2018;44:98–114. doi: 10.1016/j.media.2017.12.003. - DOI - PubMed
    1. Al-hammuri K. Ph.D. Thesis. University of Victoria; Victoria, BC, Canada: 2019. Computer Vision-Based Tracking and Feature Extraction for Lingual Ultrasound.

LinkOut - more resources