Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct;17(10):2387-99.
doi: 10.1093/cercor/bhl147. Epub 2007 Jan 11.

Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception

Affiliations

Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception

Jeremy I Skipper et al. Cereb Cortex. 2007 Oct.

Abstract

Observing a speaker's mouth profoundly influences speech perception. For example, listeners perceive an "illusory" "ta" when the video of a face producing /ka/ is dubbed onto an audio /pa/. Here, we show how cortical areas supporting speech production mediate this illusory percept and audiovisual (AV) speech perception more generally. Specifically, cortical activity during AV speech perception occurs in many of the same areas that are active during speech production. We find that different perceptions of the same syllable and the perception of different syllables are associated with different distributions of activity in frontal motor areas involved in speech production. Activity patterns in these frontal motor areas resulting from the illusory "ta" percept are more similar to the activity patterns evoked by AV(/ta/) than they are to patterns evoked by AV(/pa/) or AV(/ka/). In contrast to the activity in frontal motor areas, stimulus-evoked activity for the illusory "ta" in auditory and somatosensory areas and visual areas initially resembles activity evoked by AV(/pa/) and AV(/ka/), respectively. Ultimately, though, activity in these regions comes to resemble activity evoked by AV(/ta/). Together, these results suggest that AV speech elicits in the listener a motor plan for the production of the phoneme that the speaker might have been attempting to produce, and that feedback in the form of efference copy from the motor system ultimately influences the phonetic interpretation.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: None declared.

Figures

Figure 1
Figure 1
Neurally specified model of AV speech perception as presented in the text. A multisensory description in the form of a hypothesis about the observed talker’s mouth movements and speech sounds (in STp areas) results in the specification (solid lines) of the motor goals of that hypothesis (in the POp the suggested human homologue of macaque area F5 where mirror neurons have been found). These motor goals are mapped to a motor plan that can be used to reach that goal (in PMv and primary motor cortices [M1]). This results in the prediction through efference copy (dashed lines) of the auditory and somatosensory states associated with executing those motor commands. Auditory (in STp areas) and somatosensory (in the SMG and primary and secondary somatosensory cortices [SI/SII]) predictions are compared with the current description of the sensory state of the listener. The result is an improvement in speech perception in AV contexts due to a reduction in ambiguity of the intended message of the observed talker.
Figure 2
Figure 2
Logical conjunction analyses. Orange indicates regions where activation associated with speaking syllables overlaps with that of activation associated with passively (A) listening to and watching the same congruent AV syllables; (B) watching only the video of these syllables without the accompanying audio track (V); and (C) listening to the syllables without the accompanying video track (A). Overlap images were created using images each thresholded at P < 0.05 corrected and logically conjoined. Blue indicates additional regions activated by passive perception alone and not activated by speech production (P < 0.05 corrected).
Figure 3
Figure 3
Correlation analyses. Correlation of the distributions of activation associated with passively listening to and watching the incongruent AV syllable made from an audio /pa/ and a visual /ka/ (denoted as ApVk) and the distributions of activation for AV/pa/ (i.e., “ApVk = AV/pa/” in gray), AV/ka/ (i.e., “ApVk = AV/ka/” in blue), or AV/ta/ (i.e., “ApVk = AV/ta/” in orange) in regions that overlapped speech production. The ApVk stimulus elicited the McGurk-MacDonald effect, perceived as “ta” in this group of participants. (A) Correlations analysis when collapsed over the entire time course of activation in all frontal, auditory, and somatosensory sensory, and occipital regions that overlap speech production (Friedman test on pairwise correlations, P values < 0.004; Nemenyi post hoc tests on resulting ranks, *P values < 0.002). This analysis was also conducted at each time point following stimulus onset in the frontal and auditory and somatosensory sensory regions that overlap speech production (see Experimental Procedures). The entire time course of activation is shown in an example (B) motor region, PMv cortex in the right hemisphere; (C) auditory and somatosensory region, the SMG in the left hemisphere; and (D) visual region, the middle occipital gyrus in the right hemisphere (P values < 0.05).
Figure 4
Figure 4
Analysis of the classification condition (i.e., run 5). Contrast (P < 0.05 corrected) of activation resulting from hearing a syllable made from an audio /pa/ and a visual /ka/ (denoted as ApVk) in one of 2 ways. Blue and orange indicate regions showing differential activation when participants classified ApVk as “ka” or “ta,” respectively, in a 3AFC task. Activation when ApVk was classified as “ka” is seen in the middle and inferior frontal gyri and insula. Activation when ApVk was classified as “ta” or “ka” is in spatially adjacent but distinct areas in the right inferior and superior parietal lobules, left somatosensory cortices, left PMv cortex, and left primary motor cortex.

Similar articles

Cited by

References

    1. Abbs JH, Sussman HM. Neurophysiological feature detectors and speech perception: a discussion of theoretical implications. J Speech Hear Res. 1971;14:23–36. - PubMed
    1. Barlow HB. Single units and sensation: a neuron doctrine for perceptual psychology? Perception. 1972;1:371–394. - PubMed
    1. Barlow HB, Narasimhan R, Rosenfeld A. Visual pattern analysis in machines and animals. Science. 1972;177:567–575. - PubMed
    1. Belin P, Zatorre RJ, Ahad P. Human temporal-lobe response to vocal sounds. Brain Res Cogn Brain Res. 2002;13:17–26. - PubMed
    1. Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B. Voice-selective areas in human auditory cortex. Nature. 2000;403:309–312. - PubMed

Publication types