Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May:144:105339.
doi: 10.1016/j.compbiomed.2022.105339. Epub 2022 Feb 28.

LARNet-STC: Spatio-temporal orthogonal region selection network for laryngeal closure detection in endoscopy videos

Affiliations

LARNet-STC: Spatio-temporal orthogonal region selection network for laryngeal closure detection in endoscopy videos

Yang Yang Wang et al. Comput Biol Med. 2022 May.

Abstract

The vocal folds (VFs) are a pair of muscles in the larynx that play a critical role in breathing, swallowing, and speaking. VF function can be adversely affected by various medical conditions including head or neck injuries, stroke, tumor, and neurological disorders. In this paper, we propose a deep learning system for automated detection of laryngeal adductor reflex (LAR) events in laryngeal endoscopy videos to enable objective, quantitative analysis of VF function. The proposed deep learning system incorporates our novel orthogonal region selection network and temporal context. This network learns to directly map its input to a VF open/close state without first segmenting or tracking the VF region. This one-step approach drastically reduces manual annotation needs from labor-intensive segmentation masks or VF motion tracks to frame-level class labels. The proposed spatio-temporal network with an orthogonal region selection subnetwork allows integration of local image features, global image features, and VF state information in time for robust LAR event detection. The proposed network is evaluated against several network variations that incorporate temporal context and is shown to lead to better performance. The experimental results show promising performance for automated, objective, and quantitative analysis of LAR events from laryngeal endoscopy videos with over 90% and 99% F1 scores for LAR and non-LAR frames respectively.

Keywords: Deep learning; Laryngeal adductor reflex; Laryngeal closure detection; Laryngeal endoscopy; Vocal folds.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Sample images for the three vocal fold state classes.
Figure 2:
Figure 2:
Sample laryngoscopy video frames illustrating different processing challenges. (a-c) Left images show original video frames, right images show corresponding histogram equalized images.
Figure 3:
Figure 3:
Architecture of the proposed VFs state estimation network.
Figure 4:
Figure 4:
Subregion cropping and the Orthogonal Region Selection (ORS) subnetwork. Inputs to the network are five cropped subregions (marked with yellow squares) from the preprocessed image. Output of the network is a 1-D feature vector corresponding to the selected subregion. This vector is selected from F by the index j* of the minimum value in O. “FC” represents fully-connected layer.
Figure 5:
Figure 5:
Architecture of the proposed spatio-temporal context-based orthogonal region selection network. On top of the VF state estimation networks, a set of fully convolutional layers are inserted to the network to incorporate temporal context. “Conv” represents convolution operation.
Figure 6:
Figure 6:
Four different architectures of spatio-temporal context-based networks.
Figure 7:
Figure 7:
Boxplot of the F1 scores for the five-fold cross-validation of three proposed networks. The green triangle is the mean across five folds.
Figure 8:
Figure 8:
Quantification evaluation of LAR event durations (number of frames). (a) Histogram of the distribution of ground truth LAR event durations. (b) cumulative distribution of the frame error of LAR event prediction. (c) Comparison of the ground truth and prediction of VF states for a single video. (d) Sample original video frames at timestamps A, B, C, and D in (c).
Figure 9:
Figure 9:
Segmentation-derived LAR/non-LAR classification results. Average F1 scores for non-LAR frames. VFs segmentation algorithms (U-LSTM [23], FCRN [21], and FCRN [21] + histogram equalization + ORS) and the proposed LARNet-STC.
Figure 10:
Figure 10:
Segmentation-derived LAR/non-LAR classification results. Average F1 scores for LAR frames. VFs segmentation algorithms (U-LSTM [23], FCRN [21], and FCRN [21] + histogram equalization + ORS) and the proposed LARNet-STC.
Figure 11:
Figure 11:
Visual explanation of the LARNet-STC network output using Grad-CAM visualization [66]. Top row: subregions automatically selected by Orthogonal Region Selection (ORS) subnetwork. Bottom row: regions corresponding to high score for the predicted class marked with highlights changing from red to blue corresponding to higher to lower impact regions.
Figure 12:
Figure 12:
The confusion matrix of the results from the proposed context-based orthogonal region selection network (LARNet-STC).
Figure 13:
Figure 13:
Sample outputs from the proposed system. Red label represents ground truth. Green label represents prediction.
Figure 14:
Figure 14:
Sampled non-LAR sequential video frames (frame 6–10) with visual occlusion from laryngoscopy videos.

References

    1. Sasaki CT, Weaver EM, Physiology of the larynx, The American Journal of Medicine 103 (5) (1997) 9S–18S. - PubMed
    1. Dankbaar J, Pameijer F, Vocal cord paralysis: anatomy, imaging and pathology, Insights into imaging 5 (6) (2014) 743–751. - PMC - PubMed
    1. Weinberger M, Doshi D, Vocal cord dysfunction: a functional cause of respiratory distress, Breathe 13 (1) (2017) 15–21. - PMC - PubMed
    1. Rajaei A, Barzegar B. E, Mojiri F, Nilforoush MH, The occurrence of laryngeal penetration and aspiration in patients with glottal closure insufficiency, ISRN Otolaryngology 2014. - PMC - PubMed
    1. Toutounchi SJS, Eydi M, Golzari SE, Ghaffari MR, Parvizian N, Vocal cord paralysis and its etiologies: a prospective study, J. Cardiovascular and Thoracic Research 6 (1) (2014) 47. - PMC - PubMed

Publication types