Timing in audiovisual speech perception: A mini review and new psychophysical data
- PMID: 26669309
- PMCID: PMC4744562
- DOI: 10.3758/s13414-015-1026-y
Timing in audiovisual speech perception: A mini review and new psychophysical data
Abstract
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Keywords: Audiovisual speech; Classification image; McGurk; Multisensory integration; Prediction; Speech kinematics; Timing.
Figures
References
-
- Abry C, Lallouache MT, Cathiard MA. Speechreading by humans and machines. Springer; 1996. How can coarticulation models account for speech sensitivity to audio-visual desynchronization? pp. 247–255.
-
- Adams SG, Weismer G, Kent RD. Speaking rate and speech movement velocity profiles. Journal of Speech, Language, and Hearing Research. 1993;36(1):41–54. - PubMed
-
- Ahumada A, Lovell J. Stimulus Features in Signal Detection. The Journal of the Acoustical Society of America. 1971;49(6B):1751–1756. doi: doi: http://dx.doi.org/10.1121/1.1912577. - DOI
-
- Alais D, Burr D. The ventriloquist effect results from near-optimal bimodal integration. Current Biology. 2004;14(3):257–262. - PubMed
-
- Andersson U, Lidestam B. Bottom-up driven speechreading in a speechreading expert: the case of AA (JK023) Ear and hearing. 2005;26(2):214–224. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials
