Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 23;22(5):1751.
doi: 10.3390/s22051751.

Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings

Affiliations

Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings

Bartosz Kopczynski et al. Sensors (Basel). .

Abstract

Laryngeal high-speed videoendoscopy (LHSV) is an imaging technique offering novel visualization quality of the vibratory activity of the vocal folds. However, in most image analysis methods, the interaction of the medical personnel and access to ground truth annotations are required to achieve accurate detection of vocal folds edges. In our fully automatic method, we combine video and acoustic data that are synchronously recorded during the laryngeal endoscopy. We show that the image segmentation algorithm of the glottal area can be optimized by matching the Fourier spectra of the pre-processed video and the spectra of the acoustic recording during the phonation of sustained vowel /i:/. We verify our method on a set of LHSV recordings taken from subjects with normophonic voice and patients with voice disorders due to glottal insufficiency. We show that the computed geometric indices of the glottal area make it possible to discriminate between normal and pathologic voices. The median of the Open Quotient and Minimal Relative Glottal Area values for healthy subjects were 0.69 and 0.06, respectively, while for dysphonic subjects were 1 and 0.35, respectively. We also validate these results using independent phoniatrician experts.

Keywords: acoustic recordings of voice; image segmentation; laryngeal high-speed video; multimodal sensing; signal processing; vocal disorders.

PubMed Disclaimer

Conflict of interest statement

Authors declare no conflict of interest.

Figures

Figure A1
Figure A1
Example glottal width waveforms for a normophonic subject (a) and a patient with glottal insufficiency (b). The indicated time intervals are explained in the definitions of the parameters.
Figure 1
Figure 1
Images of the glottis for the normophonic subject N3 for the maximum opening (a) and maximum closing (b) of the vocal folds correspondingly.
Figure 2
Figure 2
Classification of glottal closure types: (A) rectangle/longitudinal, (B) hourglass, (C) triangle, (D) V-shaped, (E) spindle-shaped.
Figure 3
Figure 3
Images of the glottis for dysphonic patient D8 with severe glottal insufficiency for the maximum opening (a) and maximum closing (b) of the vocal folds correspondingly.
Figure 4
Figure 4
Images of the glottis for dysphonic patient D5 with longitudinal glottal insufficiency for the maximum opening (a) and maximum closing (b) of the vocal folds correspondingly.
Figure 5
Figure 5
Images of the glottis for dysphonic patient D4 with minimal spindle-shaped glottal insufficiency for the maximum opening (a) and maximum closing (b) of the vocal folds correspondingly.
Figure 6
Figure 6
Photograph of the LHSV recoding system with the 70-degree rigid scope, attached light source, and a microphone. The box on the lowest shelf of the rack is the endoscope’s light source, and the box on the middle shelf is a high-speed camera offering acquisition of up to 4000 images per second (a), a diagram showing the position of the laryngoscope during laryngeal examination (b).
Figure 7
Figure 7
The image of the glottis of a normophonic subject (a) and the corresponding total variation image (b), as defined in Equation (1), represented as a heat map (the larger the variation, the warmer the color of the map).
Figure 8
Figure 8
The processing pipeline of recorded LHSV images and synchronously recorded voice signal during sustained phonation of vowel /i:/.
Figure 9
Figure 9
Representations of the LHSV image: (a) laryngeal image of the glottis, (b) detected contour of the glottal boundary, (c) glottal area, (d) the glottovibrogram, (e) the glottal area waveform.
Figure 10
Figure 10
Diagram explaining the designed method of the segmentation of glottal images where, in the search for the best segmentation results, the Fourier spectra derived from the pool of segmented LHSVs are compared to the Fourier spectra of the acoustic recording.
Figure 11
Figure 11
Plot of the cost function map dα,β, (left panel) and example image segmentation results (right panel) obtained for the normophonic subject N2. The segmentation results are obtained for parameters (α, β) and assigned different numbers in the cost function plot. The best segmentation result is shown in a thick box on the left side of the right panel and marked with the number 1.
Figure 12
Figure 12
Example segmentations of the glottic images selected by the phoniatricians: images (ac) are for the normophonic subject N10; (df) is for the patient I5 with glottal insufficiency; images (a) and (d) are the results obtained for the optimum segmentation parameter set (α*, β*) minimizing cost function (5).
Figure 13
Figure 13
Box-and-whisker plots of calculated quotients for normophonic and dysphonic subjects. The upper and lower boundaries of the boxes indicate first quartile Q1 to third quartile Q3, respectively, while the boundary of the lower whisker denotes the minim value in the data set and the upper whisker boundary denotes the maximum value in the data set.
Figure 14
Figure 14
3D plot for indices MRGA, OQ, CQ illustrating good discrimination of the normophonic subjects (green dots) and patients with glottal insufficiency (red dots).

References

    1. Carding P. Occupational voice disorders: Is there a firm case for industrial injuries disablement benefit? Logop. Phoniatr. Vocol. 2007;32:47–48. doi: 10.1080/14015430600881901. - DOI - PubMed
    1. Woo P. Objective Measures of Stroboscopy and High-Speed Video. Adv. Otorhinolaryngol. 2020;85:25–44. doi: 10.1159/000456681. - DOI - PubMed
    1. Behlau M. The 2016 G. Paul Moore Lecture: Lessons in Voice Rehabilitation: Journal of Voice and Clinical Practice. J. Voice. 2019;33:669–681. doi: 10.1016/j.jvoice.2018.02.020. - DOI - PubMed
    1. De Jong F.I.C.R.S., Kooijman P.G.C., Thomas G., Huinck W.J., Graamans K., Schutte H.K. Epidemiology of voice problems in Dutch teachers. Folia Phoniatr. Logop. 2006;58:186–198. doi: 10.1159/000091732. - DOI - PubMed
    1. Dejonckere P.H., Bradley P., Clemente P., Cornut G., Crevier-Buchman L., Friedrich G., Van De Heyning P., Remacle M., Woisard V. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS) Eur. Arch. Otorhinolaryngol. 2001;258:77–82. - PubMed

LinkOut - more resources