Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 22:3:619.
doi: 10.3389/fpsyg.2012.00619. eCollection 2012.

Temporal structure and complexity affect audio-visual correspondence detection

Affiliations

Temporal structure and complexity affect audio-visual correspondence detection

Rachel N Denison et al. Front Psychol. .

Abstract

Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence) to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task reproduced features of past findings based on explicit timing judgments but did not show any special advantage for perfectly synchronous streams. Importantly, the complexity of temporal patterns influences sensitivity to correspondence. Stochastic, irregular streams - with richer temporal pattern information - led to higher audio-visual matching sensitivity than predictable, rhythmic streams. Our results reveal that temporal structure and its complexity are key determinants for human detection of audio-visual correspondence. The distinctive emphasis of our new paradigms on temporal patterning could be useful for studying special populations with suspected abnormalities in audio-visual temporal perception and multisensory integration.

Keywords: crossmodal; information theory; multisensory; perceptual correspondence; perceptual timing; synchrony.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Audio-visual matching task with temporally complex event streams. Participants listened binaurally through headphones to one rapid stream of auditory tone pips (see illustrative train at center bottom) while viewing on a gray screen two streams of visual gratings in diagonally opposite quadrants. The gratings independently flipped between left- and right-tilted 45° orientations over the course of a trial. One of the two visual streams (the matching stream) corresponded to the auditory stream in its temporal pattern of orientation flips, which was either presented in synchrony with the auditory stream or delayed relative to that stream by specific temporal lags. The other visual stream (the non-matching stream) had a different temporal pattern of flips (see main text). After each trial (which lasted 2.4–4.8 s depending on the experiment and condition), participants reported which of the two visual streams had matched the auditory stream in temporal pattern. Example timelines (truncated here for brevity) of auditory and visual streams show auditory events (tone pips) and visual events (orientation flips) as trains of vertical bars. A key aspect of the procedure was that for all lagged conditions, the proportion of (coincidentally) synchronous audio-visual events was equated, see main text.
Figure 2
Figure 2
Experiment 1: lagged stochastic event streams with varied presentation rate. (A) Auditory streams (Aud) were generated using a stochastic point process with 1/3 probability that an auditory event would occur in each fixed-duration timebin. Matching visual streams (Lag 0–3) were identical to the auditory stream in temporal pattern and were either synchronous with the auditory stream (Lag 0) or could be delayed or advanced by 1–3 timebins (Lag 1–3, audition-leading-vision (AV) condition shown). Lagged events that exceeded the trial duration were looped back to the beginning of the stream to eliminate any gaps in the stimulus caused by lagging and to equate the number of events in the streams. Non-matching visual streams (not shown) were generated by shuffling the ISIs of matching streams (see text). All streams contained added initial and final events to equate stream onset and offset across streams. All lagged conditions were equated for the number of (coincidentally) synchronous events generated by the binned stochastic procedure, see main text. Example event streams are shown truncated here for brevity. (B) In the faster-rate condition, timebins were 50 ms and streams lasted 2.4 s, while in the slower-rate condition (presented at half that rate), timebins were 100 ms and streams lasted 4.8 s. Thus, a 100 ms time lag corresponded to a lag of two timebins in a faster-rate trial but only one timebin in a slower-rate trial. Preserving the number of possible events across rate conditions equated the information (entropy) carried by the two types of streams. Altogether, these stimuli allowed investigation of the effect of stream presentation rate for lags of identical duration and for streams with identical levels of information (see main text). Example auditory (upper stream) and matching visual (lower) stream timelines are illustrated here for each rate condition, truncated for brevity. (C) Mutual information between auditory and matching visual streams (AVm, black line) and between auditory and non-matching visual streams (AVnm, gray line) was calculated from first-order event entropy for a single event bin (see main text). Since this measure reflects simultaneous events, mutual information is present only for AVm streams at lag zero (i.e., perfectly synchronous streams). The plot shows means and standard deviations across trials, as calculated from the actual streams presented to the first subject of the AV experiment. Since faster-rate and slower-rate streams have the same number of bins, mutual information is the same for these stream types, and so we pool all streams for the analysis. This quantification of mutual information confirms that to achieve above-chance performance on lagged trials, observers must rely on different information than merely simultaneous events.
Figure 3
Figure 3
Experiment 1: correspondence sensitivity based on shared temporal structure. (A) Matching performance (cross-modal correspondence sensitivity, indexed by 2AFC d-prime) was above chance for lags of up to 200 ms. Performance decreased linearly with cross-modal lag, with no special (i.e., no supra-linear) benefit for synchronous streams at lag 0, as compared with the other lags. Performance was higher for slower-rate (black) trials than for faster-rate (gray) trials, but the effect of time lag on performance for the two types of trials was similar. For slower-rate trials, performance was similar when auditory streams preceded matching visual streams (AV subjects, left) and when auditory streams followed matching visual streams (VA subjects, right). (B) Quantification of modality order effect for faster-rate trials. Differences in matching performance between adjacent lag pairs were taken for each subject. All error bars give standard error of the mean between subjects.
Figure 4
Figure 4
Experiment 2: pattern complexity increases sensitivity to audio-visual correspondence. (A) Rhythmic streams were created by first generating (as in Experiment 1) a random stream “bar” lasting five ISIs (highlighted in gray), and then repeating that bar to fill the trial duration. Rhythmic streams consequently had lower temporal pattern complexity than random streams, due to the redundancy arising from bar repetition. This is quantified by differences in both first-order and second-order ISI entropy in the streams resulting from the ISI distributions of random and rhythmic streams (see main text). Examples of two rhythmic streams generated on different trials are shown (truncated for brevity). (B) Matching performance was better for random streams (black) with higher pattern complexity, compared to performance for rhythmic streams (gray). Performance for both stream types decreased in a similarly linear fashion with lag. Standard error of the difference between random and rhythmic conditions is shown for each lag.
Figure 5
Figure 5
Experiment 3: audio-visual correspondence sensitivity depends on sequence complexity, not merely ISI variability. (A) Generation of pseudorandom streams. From top to bottom: first, a base stream was generated as for rhythmic trials. All ISIs in this stream (highlighted in gray with solid outline) were then shuffled to create the pseudorandom auditory stream (shuffled ISIs highlighted in gray with dashed outline). Accordingly the distribution of ISIs was identical for pseudorandom and rhythmic streams. The matching visual stream (shown at lag 0 for simplicity here) was then divided into five-ISI bars (each highlighted in gray with solid outline). Each of these bars was then internally shuffled to generate the non-matching visual stream (shuffled bars highlighted in gray with dashed outline). Hence the relationship between the ISIs of the non-matching stream and the ISIs of the accompanying auditory and matching streams were the same for pseudorandom and rhythmic streams even at the bar-length timescale. In an information-theoretic analysis, this stream generation procedure resulted in pseudorandom and rhythmic streams with ISI distributions that had identical first-order entropies but still differed in second-order entropy, with more information in pseudorandom streams due to their more variable ISI transitional probabilities (see main text). (B) Matching performance was enhanced for pseudorandom streams (black) compared to rhythmic streams (gray), with only sequence complexity differing between streams in the two conditions, and higher complexity leading to better performance. Standard error of the difference between pseudorandom and rhythmic conditions is shown for each lag.

References

    1. Arrighi R., Alais D., Burr D. (2006). Perceptual synchrony of audiovisual streams for natural and artificial motion sequences. J. Vis. 6, 260–26810.1167/6.3.6 - DOI - PubMed
    1. Bertelson P., Radeau M. (1976). Ventriloquism, sensory interaction, and response bias: remarks on the paper by Choe, Welch, Gilford, and Juola. Percept. Psychophys. 19, 531–53510.3758/BF03211222 - DOI
    1. Calvert G. A., Spence C., Stein B. E. (eds). (2004). The Handbook of Multisensory Processes. Cambridge, MA: MIT Press
    1. Cook L. A., Van Valkenburg D. L., Badcock D. R. (2011). Predictability affects the perception of audiovisual synchrony in complex sequences. Atten. Percept. Psychophys. 73, 2286–229710.3758/s13414-011-0185-8 - DOI - PubMed
    1. Dixon N. F., Spitz L. (1980). The detection of auditory visual desynchrony. Perception 9, 719–72110.1068/p090719 - DOI - PubMed

LinkOut - more resources