The processing of audio-visual speech: empirical and neural bases

Ruth Campbell¹

Affiliations

PMID: 17827105
PMCID: PMC2606792
DOI: 10.1098/rstb.2007.2155

Review

The processing of audio-visual speech: empirical and neural bases

Ruth Campbell. Philos Trans R Soc Lond B Biol Sci. 2008.

. 2008 Mar 12;363(1493):1001-10.

doi: 10.1098/rstb.2007.2155.

Author

Ruth Campbell¹

Affiliation

¹ Department of Human Communication Science, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, UK. r.campbell@ucl.ac.uk

PMID: 17827105
PMCID: PMC2606792
DOI: 10.1098/rstb.2007.2155

Abstract

In this selective review, I outline a number of ways in which seeing the talker affects auditory perception of speech, including, but not confined to, the McGurk effect. To date, studies suggest that all linguistic levels are susceptible to visual influence, and that two main modes of processing can be described: a complementary mode, whereby vision provides information more efficiently than hearing for some under-specified parts of the speech stream, and a correlated mode, whereby vision partially duplicates information about dynamic articulatory patterning.Cortical correlates of seen speech suggest that at the neurological as well as the perceptual level, auditory processing of speech is affected by vision, so that 'auditory speech regions' are activated by seen speech. The processing of natural speech, whether it is heard, seen or heard and seen, activates the perisylvian language regions (left>right). It is highly probable that activation occurs in a specific order. First, superior temporal, then inferior parietal and finally inferior frontal regions (left>right) are activated. There is some differentiation of the visual input stream to the core perisylvian language system, suggesting that complementary seen speech information makes special use of the visual ventral processing stream, while for correlated visual speech, the dorsal processing stream, which is sensitive to visual movement, may be relatively more involved.

PubMed Disclaimer

Figures

**Figure 1**
(a) Speech images used in the fMRI experiment reported by Capek *et al*. (2005). Vowels are shown in the top row and consonants in the bottom row. (b) Rendered group activation maps for the task of distinguishing vowel and consonant lip shapes. Images were presented singly for decision. The baseline task was to detect the movement of a cross on a blank background (Capek *et al*. 2005). pSTS (black circle) was not activated. Significant foci of activation (x, y, z, coordinates) included: (i) inferior temporal cortex/fusiform gyrus (−29, −78, −17), (ii) right inferior frontal cortex extending into dlpfc (47, 11, 26), (iii) left inferior frontal cortex extending into dlpfc (−47, 7, 33), (iv) left inferior parietal lobule (−25, −63, 43), and (v) caudal anterior cingulate gyrus (−3, 11, 50).

**Figure 2**
Schematic of the left hemisphere showing locations and activation sequence for the processing of visual speech (adapted from Nishitani & Hari (2002)). Participants in this MEG study of silent speech-reading identified vowel forms from videoclips. The following regions were activated, in sequence (a) visual cortex, including visual movement regions, (b) superior temporal gyrus (secondary auditory cortex), (c) pSTS and inferior parietal lobule, (d) inferior frontal and (e) premotor cortex. Auditory inputs (primary auditory cortex, A1) are hypothesized to access this system at (b).

See this image and copyright information in PMC

References

1. Alsius A, Navarra J, Campbell R, Soto-Faraco S.S. Audiovisual integration of speech falters under high attention demands. Curr. Biol. 2005;15:839–843. doi:10.1016/j.cub.2005.03.046 - DOI - PubMed
1. Andersson U, Lidestam B. Bottom-up driven speechreading in a speechreading expert: the case of AA (JK023) Ear Hear. 2005;26:214–224. doi:10.1097/00003446-200504000-00008 - DOI - PubMed
1. Auer E.T, Jr, Bernstein L.E. Speechreading and the structure of the lexicon: computationally modelling the effects of reduced phonetic distinctiveness on lexical uniqueness. J. Acoust. Soc. Am. 1997;102:3704–3710. doi:10.1121/1.420402 - DOI - PubMed
1. Bernstein L.E, Auer E.T, Moore J.K, Ponton C.W, Don M, Singh M. Visual speech perception without primary auditory cortex activation. Neuroreport. 2002;13:311–315. doi:10.1097/00001756-200203040-00013 - DOI - PubMed
1. Bernstein L.E, Auer E.T, Jr, Moore J.K. Audiovisual speech binding: convergence or association? In: Calvert G.A, Spence C, Stein B.E, editors. The handbook of multisensory perception. MIT Press; Cambridge, MA: 2004a. pp. 203–224.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The processing of audio-visual speech: empirical and neural bases

Affiliation

The processing of audio-visual speech: empirical and neural bases

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources