Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 16;5(4):e10217.
doi: 10.1371/journal.pone.0010217.

Audio-visual speech cue combination

Affiliations

Audio-visual speech cue combination

Derek H Arnold et al. PLoS One. .

Abstract

Background: Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process.

Principal findings: Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation.

Conclusion: Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Bar plots showing d' sensitivities.
(a) Bar plot showing sensitivities for AUD, VIS and AV presentations during Simultaneous runs of trials. Data are shown for each of six observers, along with the average performance across observers. Error bars depict +/− 1 SEM. Subjects 3, 4 & 5 are authors. Note that their data does not differ qualitatively from other participants (b) Data from Sequential runs of trials. Details are as above.
Figure 2
Figure 2. Bar plot depicting sensitivity during Simultaneous AUD-VIS trials (Red) and AV sensitivities predicted on the basis of different magnitudes of summation.
Predictions are based on AUD and VIS sensitivities during Simultaneous trial runs (see Figure 1a). K = 1 corresponds with a linear integration prediction, k = 2 with a quadratic summation, and k = 3 with probability summation (see main text for details). Error bars depict +/− 1 SEM.
Figure 3
Figure 3. Bar plots depicting observed and predicted sensitivities.
(a) Bar plot depicting sensitivity during Sequential AUD AUD trials (red) and AUD AUD sensitivities predicted on the basis of different magnitudes of summation. Predictions are based on AUD sensitivities during Simultaneous trial runs. (b) Bar plot depicting sensitivity during Sequential VIS VIS trials (red) and VIS VIS sensitivities predicted on the basis of different magnitudes of summation. Predictions are based on VIS sensitivities during Simultaneous trial runs. (c) Bar plot depicting sensitivity during Sequential AUD VIS trials (Red) and AUD VIS sensitivities predicted on the basis of different magnitudes of summation. Predictions are based on AUD and VIS sensitivities during Simultaneous trial runs. Error bars depict +/− 1 SEM.

References

    1. Bresciani JP, Ernst MO, Drewing K, Bouyer G, Maury V, et al. Feeling what you hear: Auditory signals can modulate tactile tap perception. Experimental Brain Research. 2005;162:172–180. - PubMed
    1. Lopez-Moliner J, Soto-Faraco S. Vision affects how fast we hear sounds move. Journal of Vision. 2007;7:1–7. - PubMed
    1. McGurk H, MacDonald J. Hearing lips and seeing voices. Nature. 1976;264:746–748. - PubMed
    1. Pavani F, Spence C, Driver J. Visual capture of touch: out-of-the-body experiences with rubber gloves. Psychological Science. 2000;11:353–359. - PubMed
    1. Sekuler R, Sekuler AB, Lau R. Sound alters visual motion perception. Nature. 1997;385:308. - PubMed

Publication types