Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 8:4:30-6.
doi: 10.2174/1874440001004020030.

The role of speech production system in audiovisual speech perception

Affiliations

The role of speech production system in audiovisual speech perception

Iiro P Jääskeläinen. Open Neuroimag J. .

Abstract

Seeing the articulatory gestures of the speaker significantly enhances speech perception. Findings from recent neuroimaging studies suggest that activation of the speech motor system during lipreading enhance speech perception by tuning, in a top-down fashion, speech-sound processing in the superior aspects of the posterior temporal lobe. Anatomically, the superior-posterior temporal lobe areas receive connections from the auditory, visual, and speech motor cortical areas. Thus, it is possible that neuronal receptive fields are shaped during development to respond to speech-sound features that coincide with visual and motor speech cues, in contrast with the anterior/lateral temporal lobe areas that might process speech sounds predominantly based on acoustic cues. The superior-posterior temporal lobe areas have also been consistently associated with auditory spatial processing. Thus, the involvement of these areas in audiovisual speech perception might partly be explained by the spatial processing requirements when associating sounds, seen articulations, and one's own motor movements. Tentatively, it is possible that the anterior "what" and posterior "where / how" auditory cortical processing pathways are parts of an interacting network, the instantaneous state of which determines what one ultimately perceives, as potentially reflected in the dynamics of oscillatory activity.

Keywords: Audiovisual speech perception; electroencephalography.; functional MRI; magnetoencephalography; speech motor theory.

PubMed Disclaimer

Figures

Fig. (1)
Fig. (1)
Lipreading suppresses auditory cortex ~100 ms responses speech sound formant specifically. TOP: Sinusoidal sound sweep analogs of the first formant transition common to /ba/, /ga/, and /da/ sounds, and a continuum of second-formant transitions ranging from the second formant sweep contained in /ba/ to that contained in /ga/ were presented to subjects while they were watching a sequence of short videoclips of a person articulating either /ba/, /ga/, or a still-face control picture. BOTTOM: Comparison of MEG responses to the second-formant sound sweep contained in /ga/ in the still-face baseline, /ba/, and /ga/ lipreading conditions disclosed suppressed responses ~100 ms from sound onset when the subjects were lipeading /ga/ (adapted with permission from [9]).
Fig. (2)
Fig. (2)
Seeing the place of articulation enhances activations to speech sounds in the superior-posterior temporal lobe. TOP: The unfiltered articulating face contained all visual speech gesture motion, the spatial midfrequency wavelet band-pass filtered condition maintained the place of articulation information, and the spatial low-frequency wavelet band-pass filtered condition consisted of gross properties of movement of the lips, jaw, and head. BOTTOM: Sites of multi-sensory integration selectively induced by auditory and visual correspondence of place of articulation information, revealed by contrasting the activity during both the middle-frequency and unfiltered conditions with the activations during the low-frequency condition, were localized predominantly to the left middle temporal gyrus, left superior temporal sulcus, and left superior temporal sulcus (adapted with permission from [30]).
Fig. (3)
Fig. (3)
An example of activation of speech motor areas, including Broca’s area, motor cortex, and parietal cortical areas, during audiovisual speech perception in a recent study (adapted with permission from [79]).
Fig. (4)
Fig. (4)
Correlations as a function of time of the distributions of activations elicited by incongruent audiovisual syllable (auditory /pa/ and a visual /ka/) with the distributions of activation to congruent audiovisual /pa/ (gray), /ka/ (blue), /ta/ (orange) in A) ventral premotor cortical areas, B) left supramarginal gyrus, and (C) visual cortical areas. Note that the activity patterns in premotor areas correlated to those elicited by the /ta/ at a shorter latency than in the temporo-parietal and visual cortical areas, suggesting that that there was an efference copy from the speech motor system that shapes phonetic perception at the sensory-cortical level (adapted with permission from [34]).

Similar articles

Cited by

References

    1. Sumby WH, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am. 1954;26:212–15.
    1. Ross LA, Saint-Amour D, Leavitt VM, Javitt DC, Foxe JJ. Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cereb Cortex. 2007;17:1147–53. - PubMed
    1. Ma WJ, Zhou X, Ross LA, Foxe JJ, Parra LC. Lip-reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space. PLoS One. 2009;4:e4638. - PMC - PubMed
    1. MacLeod A, Summerfield AQ. Quantifying the contribution of vision to speech perception in noise. Br J Audiol. 1987;21:131–41. - PubMed
    1. McGurk H, MacDonald J. Hearing lips and seeing voices. Nature. 1976;264:746–48. - PubMed

LinkOut - more resources