Audio-visual speech cue combination

Derek H Arnold¹, Morgan Tear, Ryan Schindel, Warrick Roseboom

Affiliations

PMID: 20419130
PMCID: PMC2855706
DOI: 10.1371/journal.pone.0010217

Audio-visual speech cue combination

Derek H Arnold et al. PLoS One. 2010.

. 2010 Apr 16;5(4):e10217.

doi: 10.1371/journal.pone.0010217.

Authors

Derek H Arnold¹, Morgan Tear, Ryan Schindel, Warrick Roseboom

Affiliation

¹ School of Psychology, The University of Queensland, St. Lucia, Queensland, Australia. d.arnold@psy.uq.edu.au

PMID: 20419130
PMCID: PMC2855706
DOI: 10.1371/journal.pone.0010217

Abstract

Background: Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process.

Principal findings: Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation.

Conclusion: Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Bar plots showing d' sensitivities.**
(a) Bar plot showing sensitivities for AUD, VIS and AV presentations during Simultaneous runs of trials. Data are shown for each of six observers, along with the average performance across observers. Error bars depict +/− 1 SEM. Subjects 3, 4 & 5 are authors. Note that their data does not differ qualitatively from other participants (b) Data from Sequential runs of trials. Details are as above.

**Figure 2. Bar plot depicting sensitivity during Simultaneous AUD-VIS trials (Red) and AV sensitivities predicted on the basis of different magnitudes of summation.**
Predictions are based on AUD and VIS sensitivities during Simultaneous trial runs (see Figure 1a). K = 1 corresponds with a linear integration prediction, k = 2 with a quadratic summation, and k = 3 with probability summation (see main text for details). Error bars depict +/− 1 SEM.

**Figure 3. Bar plots depicting observed and predicted sensitivities.**
(a) Bar plot depicting sensitivity during Sequential AUD AUD trials (red) and AUD AUD sensitivities predicted on the basis of different magnitudes of summation. Predictions are based on AUD sensitivities during Simultaneous trial runs. (b) Bar plot depicting sensitivity during Sequential VIS VIS trials (red) and VIS VIS sensitivities predicted on the basis of different magnitudes of summation. Predictions are based on VIS sensitivities during Simultaneous trial runs. (c) Bar plot depicting sensitivity during Sequential AUD VIS trials (Red) and AUD VIS sensitivities predicted on the basis of different magnitudes of summation. Predictions are based on AUD and VIS sensitivities during Simultaneous trial runs. Error bars depict +/− 1 SEM.

See this image and copyright information in PMC

References

1. Bresciani JP, Ernst MO, Drewing K, Bouyer G, Maury V, et al. Feeling what you hear: Auditory signals can modulate tactile tap perception. Experimental Brain Research. 2005;162:172–180. - PubMed
1. Lopez-Moliner J, Soto-Faraco S. Vision affects how fast we hear sounds move. Journal of Vision. 2007;7:1–7. - PubMed
1. McGurk H, MacDonald J. Hearing lips and seeing voices. Nature. 1976;264:746–748. - PubMed
1. Pavani F, Spence C, Driver J. Visual capture of touch: out-of-the-body experiences with rubber gloves. Psychological Science. 2000;11:353–359. - PubMed
1. Sekuler R, Sekuler AB, Lau R. Sound alters visual motion perception. Nature. 1997;385:308. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Audio-visual speech cue combination

Affiliation

Audio-visual speech cue combination

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases