Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Mar 12;363(1493):1001-10.
doi: 10.1098/rstb.2007.2155.

The processing of audio-visual speech: empirical and neural bases

Affiliations
Review

The processing of audio-visual speech: empirical and neural bases

Ruth Campbell. Philos Trans R Soc Lond B Biol Sci. .

Abstract

In this selective review, I outline a number of ways in which seeing the talker affects auditory perception of speech, including, but not confined to, the McGurk effect. To date, studies suggest that all linguistic levels are susceptible to visual influence, and that two main modes of processing can be described: a complementary mode, whereby vision provides information more efficiently than hearing for some under-specified parts of the speech stream, and a correlated mode, whereby vision partially duplicates information about dynamic articulatory patterning.Cortical correlates of seen speech suggest that at the neurological as well as the perceptual level, auditory processing of speech is affected by vision, so that 'auditory speech regions' are activated by seen speech. The processing of natural speech, whether it is heard, seen or heard and seen, activates the perisylvian language regions (left>right). It is highly probable that activation occurs in a specific order. First, superior temporal, then inferior parietal and finally inferior frontal regions (left>right) are activated. There is some differentiation of the visual input stream to the core perisylvian language system, suggesting that complementary seen speech information makes special use of the visual ventral processing stream, while for correlated visual speech, the dorsal processing stream, which is sensitive to visual movement, may be relatively more involved.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Speech images used in the fMRI experiment reported by Capek et al. (2005). Vowels are shown in the top row and consonants in the bottom row. (b) Rendered group activation maps for the task of distinguishing vowel and consonant lip shapes. Images were presented singly for decision. The baseline task was to detect the movement of a cross on a blank background (Capek et al. 2005). pSTS (black circle) was not activated. Significant foci of activation (x, y, z, coordinates) included: (i) inferior temporal cortex/fusiform gyrus (−29, −78, −17), (ii) right inferior frontal cortex extending into dlpfc (47, 11, 26), (iii) left inferior frontal cortex extending into dlpfc (−47, 7, 33), (iv) left inferior parietal lobule (−25, −63, 43), and (v) caudal anterior cingulate gyrus (−3, 11, 50).
Figure 2
Figure 2
Schematic of the left hemisphere showing locations and activation sequence for the processing of visual speech (adapted from Nishitani & Hari (2002)). Participants in this MEG study of silent speech-reading identified vowel forms from videoclips. The following regions were activated, in sequence (a) visual cortex, including visual movement regions, (b) superior temporal gyrus (secondary auditory cortex), (c) pSTS and inferior parietal lobule, (d) inferior frontal and (e) premotor cortex. Auditory inputs (primary auditory cortex, A1) are hypothesized to access this system at (b).

Similar articles

Cited by

References

    1. Alsius A, Navarra J, Campbell R, Soto-Faraco S.S. Audiovisual integration of speech falters under high attention demands. Curr. Biol. 2005;15:839–843. doi:10.1016/j.cub.2005.03.046 - DOI - PubMed
    1. Andersson U, Lidestam B. Bottom-up driven speechreading in a speechreading expert: the case of AA (JK023) Ear Hear. 2005;26:214–224. doi:10.1097/00003446-200504000-00008 - DOI - PubMed
    1. Auer E.T, Jr, Bernstein L.E. Speechreading and the structure of the lexicon: computationally modelling the effects of reduced phonetic distinctiveness on lexical uniqueness. J. Acoust. Soc. Am. 1997;102:3704–3710. doi:10.1121/1.420402 - DOI - PubMed
    1. Bernstein L.E, Auer E.T, Moore J.K, Ponton C.W, Don M, Singh M. Visual speech perception without primary auditory cortex activation. Neuroreport. 2002;13:311–315. doi:10.1097/00001756-200203040-00013 - DOI - PubMed
    1. Bernstein L.E, Auer E.T, Jr, Moore J.K. Audiovisual speech binding: convergence or association? In: Calvert G.A, Spence C, Stein B.E, editors. The handbook of multisensory perception. MIT Press; Cambridge, MA: 2004a. pp. 203–224.

Publication types

LinkOut - more resources