Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Feb 8;102(6):2244-7.
doi: 10.1073/pnas.0407034102. Epub 2005 Jan 24.

Synchronizing to real events: subjective audiovisual alignment scales with perceived auditory depth and speed of sound

Affiliations

Synchronizing to real events: subjective audiovisual alignment scales with perceived auditory depth and speed of sound

David Alais et al. Proc Natl Acad Sci U S A. .

Abstract

Because of the slow speed of sound relative to light, acoustic and visual signals from a distant event often will be received asynchronously. Here, using acoustic signals with a robust cue to sound source distance, we show that judgments of perceived temporal alignment with a visual marker depend on the depth simulated in the acoustic signal. For distant sounds, a large delay of sound relative to vision is required for the signals to be perceived as temporally aligned. For nearer sources, the time lag corresponding to audiovisual alignment is smaller and scales at rate approximating the speed of sound. Thus, when robust cues to auditory distance are present, the brain can synchronize disparate audiovisual signals to external events despite considerable differences in time of arrival at the perceiver. This ability is functionally important as it allows auditory and visual signals to be synchronized to the external event that caused them.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustrations of the stimuli and procedures used in these experiments. (A) The impulse response function on the top row (5 m) was the original function recorded in the Sydney Opera House convolved with white noise. The direct sound is the initial portion of high amplitude. The long tail is the reverberant signal, which lasted 1,350 ms and was identical for all four stimuli. Because the ratio of direct-to-reverberant energy is a very strong cue to auditory source distance, attenuating the direct portion by 6 dB (a halving of amplitude) simulates a source distance of 10 m (see Methods). Further 6-dB attenuations simulated auditory distances of 20 and 40 m. (B) The visual stimulus was similar to that shown (Left), a circular luminance patch that was presented for 13 ms. The spatial profile of the stimulus (Right) was Gaussian with a full half-width of 4° of visual angle. (C) The onset of the auditory stimulus (Upper) was varied by an adaptive procedure to find the point of subjective alignment with the visual stimulus (Lower). A variable random period preceded the stimuli after the subject initiated each trial.
Fig. 2.
Fig. 2.
Data from the experimental conditions. (A) Psychometric functions for one observer at each of the four simulated auditory distances plotting the proportion of trials in which the visual stimulus was judged to have occurred before the auditory stimulus, as a function of the delay of the auditory stimulus. From left to right, the curves represent the 5-, 10-, 20-, and 40-m conditions. The abscissa shows time measured from the onset of the visual stimulus. (B) The same data as in a replotted on a logarithmic scale. It is clear from the linear plot (A) that the temporal precision of the audiovisual temporal order judgement decreases with auditory distance. However, the slopes are very similar when plotted on a logarithmic scale, indicating that the precision limit is a constant proportion of auditory distance (i.e., a Weber fraction). (C) The points of subjective audiovisual alignment (the half-height of the psychometric functions) for four observers at each of the four auditory distances. As auditory distance simulated by the direct-to-reverberant energy ratio increased, the auditory stimulus was perceptually aligned with earlier visual events, consistent with subjects using the energy ratio in their alignment judgements. The slopes of the best-fitting linear functions are shown for each observer. The average slope of 3.43 ms·m-1 is approximately consistent with the speed of sound.
Fig. 3.
Fig. 3.
The results of three control conditions shown for four observers. ▵ indicate audiovisual alignment for the first control in which only the onset burst of the four auditory stimuli was presented (the reverberant tail was removed). The slope of the best-fitting straight line to the averaged data was not significantly different from zero, showing that the reverberant tail is necessary to produce the shifts in subjective alignment seen in the first experiment. ▪ indicate the results of a second condition designed to control for loudness differences in the original experiment by scaling the amplitude of the original 5-m stimulus to match that of the other depths over the first 200 ms (see text). The best-fitting line to the averaged data does not differ significantly from zero, indicating that loudness differences between the four original stimuli did not determine the shifts in subjective alignment. ○ show results from a speeded attention condition. The best-fitting line to the averaged data was not significantly different from zero in slope, showing that focusing attention on the early part of the signal to make a speeded response will lead to the discounting of the direct-to-reverberant energy ratio. This finding suggests that use of this depth cue in audiovisual alignment is not mandatory and must be task-relevant.

References

    1. Bald, L., Berrien, F. K., Price, J. B. & Sprague, R. O. (1942) J. Appl. Psychol. 26, 382-388.
    1. Bushara, K. O., Grafman, J. & Hallett, M. (2001) J. Neurosci. 21, 300-304. - PMC - PubMed
    1. Hamlin, A. J. (1895) Am. J. Psychol. 6, 564-575.
    1. Hirsh, I. J. & Sherrick, C. E. (1961) J. Exp. Psychol. 62, 423-432. - PubMed
    1. Lewkowicz, D. J. (1996) J. Exp. Psychol. Hum. Percept. Perform. 5, 1094-1106. - PubMed

Publication types

MeSH terms