A study of lip movements during spontaneous dialog and its application to voice activity detection
- PMID: 19206891
- DOI: 10.1121/1.3050257
A study of lip movements during spontaneous dialog and its application to voice activity detection
Abstract
This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector (VAD). To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence (i.e., speech+nonspeech audible events). A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals.
Similar articles
-
Seeing to hear better: evidence for early audio-visual interactions in speech identification.Cognition. 2004 Sep;93(2):B69-78. doi: 10.1016/j.cognition.2004.01.006. Cognition. 2004. PMID: 15147940
-
Visual influences on alignment to voice onset time.J Speech Lang Hear Res. 2010 Apr;53(2):262-72. doi: 10.1044/1092-4388(2009/08-0247). Epub 2010 Mar 10. J Speech Lang Hear Res. 2010. PMID: 20220027
-
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech.J Neurosci. 2020 Jan 29;40(5):1053-1065. doi: 10.1523/JNEUROSCI.1101-19.2019. Epub 2019 Dec 30. J Neurosci. 2020. PMID: 31889007 Free PMC article.
-
Lipreading and audio-visual speech perception.Philos Trans R Soc Lond B Biol Sci. 1992 Jan 29;335(1273):71-8. doi: 10.1098/rstb.1992.0009. Philos Trans R Soc Lond B Biol Sci. 1992. PMID: 1348140 Review.
-
[A review on the applications of acoustic analysis in diagnosing disease].Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2007 Dec;24(6):1419-22. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2007. PMID: 18232507 Review. Chinese.
Cited by
-
The natural statistics of audiovisual speech.PLoS Comput Biol. 2009 Jul;5(7):e1000436. doi: 10.1371/journal.pcbi.1000436. Epub 2009 Jul 17. PLoS Comput Biol. 2009. PMID: 19609344 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources