Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul;130(1):283-91.
doi: 10.1121/1.3592223.

The effect of lip-reading on primary stream segregation

Affiliations

The effect of lip-reading on primary stream segregation

Aymeric Devergie et al. J Acoust Soc Am. 2011 Jul.

Abstract

Lip-reading has been shown to improve the intelligibility of speech in multitalker situations, where auditory stream segregation naturally takes place. This study investigated whether the benefit of lip-reading is a result of a primary audiovisual interaction that enhances the obligatory streaming mechanism. Two behavioral experiments were conducted involving sequences of French vowels that alternated in fundamental frequency. In Experiment 1, subjects attempted to identify the order of items in a sequence. In Experiment 2, subjects attempted to detect a disruption to temporal isochrony across alternate items. Both tasks are disrupted by streaming, thus providing a measure of primary or obligatory streaming. Visual lip gestures articulating alternate vowels were synchronized with the auditory sequence. Overall, the results were consistent with the hypothesis that visual lip gestures enhance segregation by affecting primary auditory streaming. Moreover, increases in the naturalness of visual lip gestures and auditory vowels, and corresponding increases in audiovisual congruence may potentially lead to increases in the effect of visual lip gestures on streaming.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of an audiovisual sequence. Lip gestures were presented that articulated either the three high-pitch vowels or the three low- pitch vowels, selected randomly across trials [except that in some cases, F0(1) = F0(2)].
Figure 2
Figure 2
Synchronization of audio and visual streams. The figure shows lip movements congruent with the F0(2) vowels. The maximum opening of the mouth is centered on the cued audio vowel, but begins slightly before and remains somewhat after each cued auditory vowel. Each picture of the lips was extracted from the lip gesture movie that started 67 ms (i.e., two frames) before the corresponding auditory vowel and ended 100 ms after the offset of the auditory vowel (i.e., three frames).
Figure 3
Figure 3
Results from Experiment 1. Percent correct naming on the order of presentation of the six vowels is shown as a function of the F0 difference between alternate vowels and the visual condition. Bars represent the standard errors.
Figure 4
Figure 4
Schematic representation of an audiovisual sequence. The sequence is first regular and then the audio-only stream (in black) is gradually delayed to reach the final value of dT that the listener has to detect. The audiovisual stream (in gray) is always regular. Vertical dotted lines exhibit the transition from a regular to an irregular rhythm of the audio-only stream.
Figure 5
Figure 5
Results of Experiment 2. The threshold for detecting a temporal offset between streams is plotted for each F0 difference and each visual condition across the eight participants. Errors bars represent one standard error.
Figure 6
Figure 6
Visual lip gestures in Experiment 1.
Figure 7
Figure 7
Visual lip gestures in Experiment 2.

Similar articles

Cited by

References

    1. American National Standards Institute (1995). ANSI S3.7-R2003: Methods for Coupler Calibration of Earphones, American National Standards Institute, NY.
    1. American National Standards Institute (2004). ANSI S3.21-2004: Methods for Manual Pure-Tone Threshold Audiometry, American National Standards Institute, NY.
    1. Arnal, L. H., Morillon, B., Kell, C. A., and Giraud, A.-L. (2009). “Dual neural routing of visual facilitation in speech processing.” J. Neurosci. 29, 13445–13453. 10.1523/JNEUROSCI.3194-09.2009 - DOI - PMC - PubMed
    1. Bernstein, L. E., Auer, E. T. J., and Takayanagi, S. (2004). “Auditory speech detection in noise enhanced by lipreading,” Speech Commun. 44, 5–18. 10.1016/j.specom.2004.10.011 - DOI
    1. Berthommier, F. (2003). “A phonetically neutral model of the low-level audiovisual interaction,” in Proceedings of the International Conference on Audio-Visual Speech Processing, 89–94 (Institut de la Communication Parlée, St. Jorioz, France: ).

Publication types