Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(8):e42503.
doi: 10.1371/journal.pone.0042503. Epub 2012 Aug 3.

A neural network model of ventriloquism effect and aftereffect

Affiliations

A neural network model of ventriloquism effect and aftereffect

Elisa Magosso et al. PLoS One. 2012.

Abstract

Presenting simultaneous but spatially discrepant visual and auditory stimuli induces a perceptual translocation of the sound towards the visual input, the ventriloquism effect. General explanation is that vision tends to dominate over audition because of its higher spatial reliability. The underlying neural mechanisms remain unclear. We address this question via a biologically inspired neural network. The model contains two layers of unimodal visual and auditory neurons, with visual neurons having higher spatial resolution than auditory ones. Neurons within each layer communicate via lateral intra-layer synapses; neurons across layers are connected via inter-layer connections. The network accounts for the ventriloquism effect, ascribing it to a positive feedback between the visual and auditory neurons, triggered by residual auditory activity at the position of the visual stimulus. Main results are: i) the less localized stimulus is strongly biased toward the most localized stimulus and not vice versa; ii) amount of the ventriloquism effect changes with visual-auditory spatial disparity; iii) ventriloquism is a robust behavior of the network with respect to parameter value changes. Moreover, the model implements Hebbian rules for potentiation and depression of lateral synapses, to explain ventriloquism aftereffect (that is, the enduring sound shift after exposure to spatially disparate audio-visual stimuli). By adaptively changing the weights of lateral synapses during cross-modal stimulation, the model produces post-adaptive shifts of auditory localization that agree with in-vivo observations. The model demonstrates that two unimodal layers reciprocally interconnected may explain ventriloquism effect and aftereffect, even without the presence of any convergent multimodal area. The proposed study may provide advancement in understanding neural architecture and mechanisms at the basis of visual-auditory integration in the spatial realm.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Overview of network architecture.
(A) Schematic diagram of the neural network. Each red (blue) circle represents an auditory (visual) neuron. Each line represents a synaptic connection: lines ending with an arrow indicate excitatory connections; lines ending with a solid point indicate inhibitory connections. The Gaussian patterns mimic the external visual and auditory inputs; the Gaussian functions are centered at position pm (m = v visual, m = a auditory), which represent the location of stimulus application, and have standard deviation σm and strength formula image. The fundamental assumption is σa>σv. Neurons between layers are connected via excitatory inter-area synapses (strength W). Neurons within each layers are connected via lateral (excitatory and inhibitory) synapses. For simplicity, only lateral synapses emerging from one neuron are displayed. In basal conditions, each neuron receives and sends symmetrical lateral synapses. (B) Pattern of the lateral synapses targeting (or emerging from) an exemplary neuron in either layer, in pre-training condition. Lateral excitatory (Lex) and inhibitory (Lin) synapses have a Gaussian pattern with excitation stronger but narrower than inhibition. Auto-excitation and auto-inhibition are excluded. Net lateral synapses (L) are obtained as the difference between excitatory and inhibitory synapses and assume a “Mexican hat” disposition.
Figure 2
Figure 2. Network response to unimodal stimulation and to cross-modal spatially coincident audio-visual stimulation.
(A) A unimodal stimulation was applied to the network and maintained constant throughout the entire simulation. Neural activity is shown in the new steady-state reached by the network. Left panels - Neuron activity in the auditory and visual areas in response to an auditory stimulus of amplitude formula image = 15 applied at position pa = 120°. No activity is elicited in the visual area. Right panels - Neuron activity in the auditory and visual areas in response to a visual stimulus of amplitude formula image = 15 applied at position pv = 120°. No significant activity is elicited in the auditory area. (B) An auditory stimulus and a visual stimulus are simultaneously applied at the same spatial position (pa = pv = 120°) and maintained constant throughout the simulation. Network response is shown in steady-state condition. Auditory and visual stimuli have the same strength (formula image = formula image = 15). Strong reinforcement and narrowing of auditory activation occurs (compare with Fig. 2A, left panels).
Figure 3
Figure 3. Network response to audio-visual stimulation with spatially disparate stimuli.
An auditory stimulus and a visual stimulus are simultaneously applied at two different spatial positions (pa = 100°, pv = 120°) and maintained constant throughout the simulation. Auditory and visual stimuli have the same strength (formula image = formula image = 15). Dashed red line represents activity in the auditory area; continuous blue line represents activity in the visual area. (A) Network activity in the final steady-state reached by the network. (B–G) Different snapshots of network activity during the simulation. First snapshot (B) depicts network activity immediately after the stimuli presentation; last snapshot (G) corresponds to the final state reached by the network.
Figure 4
Figure 4. Visual bias of auditory location and auditory bias of visual location.
(A) Biases predicted by the model - computed as perceived stimulus location minus original stimulus location – are displayed as a function of the angular separation between the location of the visual stimulus and the location of the auditory stimulus. The biases were computed with the vector metric when the network was in the new steady-state condition reached following stimuli presentation. The visual stimulus was maintained fixed at position pv = 120°, while the position of the auditory stimulus was ranged between 60° and 180° (visual-auditory angular separation ranging between −60° and +60°). In each simulation, stimuli have the same strength (formula image = formula image = 15). (B) Comparison between model predictions and in-vivo data. Biases predicted by the model (same results as (A)) are zoomed between 0° and 30° of visual-auditory angular separation for comparison with in-vivo data.
Figure 5
Figure 5. Results of sensitivity analysis.
Visual bias of sound location as a function of the visual-auditory angular separation (same simulation as Fig. 4), obtained using different values for the parameters characterizing synaptic connections (panels A, B, C, D) and external stimuli (panels E, F, G, H). One parameter at a time was changed, by maintaining the others at their basal value. (A) Selective elimination of synaptic mechanisms (elimination of inter-area synapses, elimination of lateral synapses). (B) Changes in the weight of inter-area connections (W). (C) Changes in the extension of lateral inhibitory synapses (σin). (D) Changes in the extension of lateral excitatory synapses (σex). It is worth noting that here the balance between lateral excitation and inhibition was varied by modifying the width of lateral synapses. Similar results can be obtained by acting on the strength of lateral synapses (parameters Lex0, Lin0). (E) Changes in the strength of the auditory stimulus (formula image). (F) Changes in the strength of the visual stimulus (formula image). (G) Changes in the width of the auditory stimulus (σa). (H) Changes in the width of the visual stimulus (σv).
Figure 6
Figure 6. Results of training paradigm 1.
(A) Case 1.a: training with spatially disparate stimuli in fixed position (pv = 120°, pa = 100°). Upper panel: Lateral synapses entering the auditory neuron in position 120° before and after training. Lower panel: Behavior of the trained network in response to auditory unimodal stimulation. The test auditory stimulus had strength formula image = 15, and was applied at different positions. For each position of the test stimulus, the shift in sound localization (perceived stimulus location minus original stimulus location) was computed in steady-state condition (after the transient response was exhausted) and reported as a function of the actual location of the test auditory stimulus (aftereffect). (B) Case 1.b: training with spatially coincident stimuli in fixed position (pv = 100°, pa = 100°). Upper panel: Lateral synapses entering the auditory neuron in position 100° before and after training. Lower panel: Behavior of the trained network in response to auditory unimodal stimulation. The same unimodal auditory test as panel A was performed to compute the aftereffect.
Figure 7
Figure 7. Results of training paradigm 2.
(A) Case 2.a: training with spatially disparate stimuli in variable position with fixed audio-visual spatial disparity (20°) The auditory stimulus could be located in one among nine positions (from 20° to 180° with 20° step), and the simultaneous visual stimulus was located in fixed spatial relationship (pv = pa+20°). The overall training procedure consists of ten trials; in each trail, the nine positions were trained once (for 200 ms, each) in a random order. Upper panel: Lateral synapses entering an exemplary auditory neuron (neuron in position 80°, one of the trained position) are shown before and at the end of the overall training procedure. Lower panel: Behavior of the trained network in response to auditory unimodal stimulation. The test auditory stimulus had strength formula image = 15, and was applied at different positions. The perceived sound location, computed in steady-state condition, was reported as a function of the original location of the test stimulus (values represented by circles). For comparison, the behavior of the untrained network was shown too (dashed line). The regression line for the post-training data (continuous line) has slope 1 and offset ∼7.5° (r2 = 0.9990, p<0.0001). (B) Case 2.b: training with spatially coincident stimuli in variable position. The auditory stimulus could be located in one among nine positions (from 20° to 180° with 20° step), and the simultaneous visual stimulus was located in the same spatial position (pv = pa). The overall training procedure was the same as panel A (but with spatially coincident stimuli). Upper panel: Lateral synapses entering an exemplary auditory neuron (neuron in position 80°, one of the trained position) are shown before and after the training. Lower panel: Behavior of the trained network in response to auditory unimodal stimulation. The same auditory unimodal test as in panel A was performed. In this case, the regression line for the post-training data is almost indistinguishable from the pre-training line.
Figure 8
Figure 8. Visual bias of sound location after training.
(A) Visual bias of sound location predicted by the model after training paradigm 1.a. The auditory stimulus was maintained fixed at position 100° (the position used during training), while the visual stimulus was located at different positions from 40° to 160° (visual-auditory angular separation ranging from −60° to 60°). The shift in sound location, computed in steady-state conditions, is displayed as a function of the visual-auditory angular separation. For the sake of comparison, results obtained before training are displayed too. (B) Visual bias of sound location predicted by the model after training paradigm 2.a. The same audio-visual stimulation as in panel A was performed, to compute the sound shift for different audio-visual disparities. The meaning of the symbols was the same as in panel A. Since training paradigm 2 involved all the acoustic space, the results displayed in the figure remain substantially unaltered for any position of the auditory stimulus.

Similar articles

Cited by

References

    1. Stein BE, Meredith MA (1993) The merging of the senses. Cambridge, MA: The MIT Press.
    1. Bertelson P, Radeau M (1981) Cross-modal bias and perceptual fusion with auditory-visual spatial discordance. Percept Psychophys 29: 578–584. - PubMed
    1. Radeau M, Bertelson P (1987) Auditory-Visual Interaction and the Timing of Inputs - Thomas (1941) Revisited. Psychological Research-Psychologische Forschung 49: 17–22. - PubMed
    1. Radeau M, Bertelson P (1977) Adaptation to auditory-visual discordance and ventriloquism in semirealistic situations. Percept Psychophys 22: 137–146.
    1. Welch RB, Warren DH (1980) Immediate perceptual response to intersensory discrepancy. Psychol Bull 88: 638–667. - PubMed