. 2012;7(8):e42503.

doi: 10.1371/journal.pone.0042503. Epub 2012 Aug 3.

A neural network model of ventriloquism effect and aftereffect

Elisa Magosso¹, Cristiano Cuppini, Mauro Ursino

Affiliations

PMID: 22880007
PMCID: PMC3411784
DOI: 10.1371/journal.pone.0042503

A neural network model of ventriloquism effect and aftereffect

Elisa Magosso et al. PLoS One. 2012.

. 2012;7(8):e42503.

doi: 10.1371/journal.pone.0042503. Epub 2012 Aug 3.

Authors

Elisa Magosso¹, Cristiano Cuppini, Mauro Ursino

Affiliation

¹ Department of Electronics, Computer Science and Systems, University of Bologna, Bologna, Italy. elisa.magosso@unibo.it

PMID: 22880007
PMCID: PMC3411784
DOI: 10.1371/journal.pone.0042503

Abstract

Presenting simultaneous but spatially discrepant visual and auditory stimuli induces a perceptual translocation of the sound towards the visual input, the ventriloquism effect. General explanation is that vision tends to dominate over audition because of its higher spatial reliability. The underlying neural mechanisms remain unclear. We address this question via a biologically inspired neural network. The model contains two layers of unimodal visual and auditory neurons, with visual neurons having higher spatial resolution than auditory ones. Neurons within each layer communicate via lateral intra-layer synapses; neurons across layers are connected via inter-layer connections. The network accounts for the ventriloquism effect, ascribing it to a positive feedback between the visual and auditory neurons, triggered by residual auditory activity at the position of the visual stimulus. Main results are: i) the less localized stimulus is strongly biased toward the most localized stimulus and not vice versa; ii) amount of the ventriloquism effect changes with visual-auditory spatial disparity; iii) ventriloquism is a robust behavior of the network with respect to parameter value changes. Moreover, the model implements Hebbian rules for potentiation and depression of lateral synapses, to explain ventriloquism aftereffect (that is, the enduring sound shift after exposure to spatially disparate audio-visual stimuli). By adaptively changing the weights of lateral synapses during cross-modal stimulation, the model produces post-adaptive shifts of auditory localization that agree with in-vivo observations. The model demonstrates that two unimodal layers reciprocally interconnected may explain ventriloquism effect and aftereffect, even without the presence of any convergent multimodal area. The proposed study may provide advancement in understanding neural architecture and mechanisms at the basis of visual-auditory integration in the spatial realm.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Overview of network architecture.**
(A) Schematic diagram of the neural network. Each red (blue) circle represents an auditory (visual) neuron. Each line represents a synaptic connection: lines ending with an arrow indicate excitatory connections; lines ending with a solid point indicate inhibitory connections. The Gaussian patterns mimic the external visual and auditory inputs; the Gaussian functions are centered at position *p^m* (*m = v* visual, *m = a* auditory), which represent the location of stimulus application, and have standard deviation *σ^m* and strength . The fundamental assumption is *σ^a*>*σ^v*. Neurons between layers are connected via excitatory inter-area synapses (strength W). Neurons within each layers are connected via lateral (excitatory and inhibitory) synapses. For simplicity, only lateral synapses emerging from one neuron are displayed. In basal conditions, each neuron receives and sends symmetrical lateral synapses. (B) Pattern of the lateral synapses targeting (or emerging from) an exemplary neuron in either layer, in pre-training condition. Lateral excitatory (*L_ex*) and inhibitory (*L_in*) synapses have a Gaussian pattern with excitation stronger but narrower than inhibition. Auto-excitation and auto-inhibition are excluded. Net lateral synapses (L) are obtained as the difference between excitatory and inhibitory synapses and assume a “Mexican hat” disposition.

formula image — **Figure 1. Overview of network architecture.**
(A) Schematic diagram of the neural network. Each red (blue) circle represents an auditory (visual) neuron. Each line represents a synaptic connection: lines ending with an arrow indicate excitatory connections; lines ending with a solid point indicate inhibitory connections. The Gaussian patterns mimic the external visual and auditory inputs; the Gaussian functions are centered at position *p^m* (*m = v* visual, *m = a* auditory), which represent the location of stimulus application, and have standard deviation *σ^m* and strength . The fundamental assumption is *σ^a*>*σ^v*. Neurons between layers are connected via excitatory inter-area synapses (strength W). Neurons within each layers are connected via lateral (excitatory and inhibitory) synapses. For simplicity, only lateral synapses emerging from one neuron are displayed. In basal conditions, each neuron receives and sends symmetrical lateral synapses. (B) Pattern of the lateral synapses targeting (or emerging from) an exemplary neuron in either layer, in pre-training condition. Lateral excitatory (*L_ex*) and inhibitory (*L_in*) synapses have a Gaussian pattern with excitation stronger but narrower than inhibition. Auto-excitation and auto-inhibition are excluded. Net lateral synapses (L) are obtained as the difference between excitatory and inhibitory synapses and assume a “Mexican hat” disposition.

**Figure 2. Network response to unimodal stimulation and to cross-modal spatially coincident audio-visual stimulation.**
(A) A unimodal stimulation was applied to the network and maintained constant throughout the entire simulation. Neural activity is shown in the new steady-state reached by the network. *Left panels* - Neuron activity in the auditory and visual areas in response to an auditory stimulus of amplitude = 15 applied at position *p^a* = 120°. No activity is elicited in the visual area. *Right panels* - Neuron activity in the auditory and visual areas in response to a visual stimulus of amplitude = 15 applied at position *p^v* = 120°. No significant activity is elicited in the auditory area. (B) An auditory stimulus and a visual stimulus are simultaneously applied at the same spatial position (*p^a* = *p^v* = 120°) and maintained constant throughout the simulation. Network response is shown in steady-state condition. Auditory and visual stimuli have the same strength ( = = 15). Strong reinforcement and narrowing of auditory activation occurs (compare with Fig. 2A, left panels).

**Figure 3. Network response to audio-visual stimulation with spatially disparate stimuli.**
An auditory stimulus and a visual stimulus are simultaneously applied at two different spatial positions (*p^a* = 100°, *p^v* = 120°) and maintained constant throughout the simulation. Auditory and visual stimuli have the same strength ( = = 15). Dashed red line represents activity in the auditory area; continuous blue line represents activity in the visual area. (A) Network activity in the final steady-state reached by the network. (**B–G**) Different snapshots of network activity during the simulation. First snapshot (B) depicts network activity immediately after the stimuli presentation; last snapshot (G) corresponds to the final state reached by the network.

**Figure 4. Visual bias of auditory location and auditory bias of visual location.**
(A) Biases predicted by the model - computed as perceived stimulus location minus original stimulus location – are displayed as a function of the angular separation between the location of the visual stimulus and the location of the auditory stimulus. The biases were computed with the vector metric when the network was in the new steady-state condition reached following stimuli presentation. The visual stimulus was maintained fixed at position *p^v* = 120°, while the position of the auditory stimulus was ranged between 60° and 180° (visual-auditory angular separation ranging between −60° and +60°). In each simulation, stimuli have the same strength ( = = 15). (B) Comparison between model predictions and in-vivo data. Biases predicted by the model (same results as (A)) are zoomed between 0° and 30° of visual-auditory angular separation for comparison with in-vivo data.

**Figure 5. Results of sensitivity analysis.**
Visual bias of sound location as a function of the visual-auditory angular separation (same simulation as Fig. 4), obtained using different values for the parameters characterizing synaptic connections (panels A, B, C, D) and external stimuli (panels E, F, G, H). One parameter at a time was changed, by maintaining the others at their basal value. (A) Selective elimination of synaptic mechanisms (elimination of inter-area synapses, elimination of lateral synapses). (B) Changes in the weight of inter-area connections (W). (C) Changes in the extension of lateral inhibitory synapses (*σ_in*). (D) Changes in the extension of lateral excitatory synapses (*σ_ex*). It is worth noting that here the balance between lateral excitation and inhibition was varied by modifying the width of lateral synapses. Similar results can be obtained by acting on the strength of lateral synapses (parameters *L_ex0*, *L_in0*). (E) Changes in the strength of the auditory stimulus (). (F) Changes in the strength of the visual stimulus (). (G) Changes in the width of the auditory stimulus (*σ^a*). (H) Changes in the width of the visual stimulus (*σ^v*).

**Figure 6. Results of training paradigm 1.**
(A) Case 1.a: training with spatially disparate stimuli in fixed position (*p^v* = 120°, *p^a* = 100°). *Upper panel*: Lateral synapses entering the auditory neuron in position 120° before and after training. *Lower panel*: Behavior of the trained network in response to auditory unimodal stimulation. The test auditory stimulus had strength = 15, and was applied at different positions. For each position of the test stimulus, the shift in sound localization (perceived stimulus location minus original stimulus location) was computed in steady-state condition (after the transient response was exhausted) and reported as a function of the actual location of the test auditory stimulus (aftereffect). (B) Case 1.b: training with spatially coincident stimuli in fixed position (*p^v* = 100°, *p^a* = 100°). *Upper panel*: Lateral synapses entering the auditory neuron in position 100° before and after training. *Lower panel*: Behavior of the trained network in response to auditory unimodal stimulation. The same unimodal auditory test as panel A was performed to compute the aftereffect.

**Figure 7. Results of training paradigm 2.**
(A) Case 2.a: training with spatially disparate stimuli in variable position with fixed audio-visual spatial disparity (20°) The auditory stimulus could be located in one among nine positions (from 20° to 180° with 20° step), and the simultaneous visual stimulus was located in fixed spatial relationship (*p^v* = *p^a*+20°). The overall training procedure consists of ten trials; in each trail, the nine positions were trained once (for 200 ms, each) in a random order. *Upper panel*: Lateral synapses entering an exemplary auditory neuron (neuron in position 80°, one of the trained position) are shown before and at the end of the overall training procedure. *Lower panel*: Behavior of the trained network in response to auditory unimodal stimulation. The test auditory stimulus had strength = 15, and was applied at different positions. The perceived sound location, computed in steady-state condition, was reported as a function of the original location of the test stimulus (values represented by circles). For comparison, the behavior of the untrained network was shown too (dashed line). The regression line for the post-training data (continuous line) has slope 1 and offset ∼7.5° (r² = 0.9990, p<0.0001). (B) Case 2.b: training with spatially coincident stimuli in variable position. The auditory stimulus could be located in one among nine positions (from 20° to 180° with 20° step), and the simultaneous visual stimulus was located in the same spatial position (*p^v* = *p^a*). The overall training procedure was the same as panel A (but with spatially coincident stimuli). *Upper panel*: Lateral synapses entering an exemplary auditory neuron (neuron in position 80°, one of the trained position) are shown before and after the training. *Lower panel*: Behavior of the trained network in response to auditory unimodal stimulation. The same auditory unimodal test as in panel A was performed. In this case, the regression line for the post-training data is almost indistinguishable from the pre-training line.

**Figure 8. Visual bias of sound location after training.**
(A) Visual bias of sound location predicted by the model after training paradigm 1.a. The auditory stimulus was maintained fixed at position 100° (the position used during training), while the visual stimulus was located at different positions from 40° to 160° (visual-auditory angular separation ranging from −60° to 60°). The shift in sound location, computed in steady-state conditions, is displayed as a function of the visual-auditory angular separation. For the sake of comparison, results obtained before training are displayed too. (B) Visual bias of sound location predicted by the model after training paradigm 2.a. The same audio-visual stimulation as in panel A was performed, to compute the sound shift for different audio-visual disparities. The meaning of the symbols was the same as in panel A. Since training paradigm 2 involved all the acoustic space, the results displayed in the figure remain substantially unaltered for any position of the auditory stimulus.

See this image and copyright information in PMC

Cited by

A Computational Analysis of Neural Mechanisms Underlying the Maturation of Multisensory Speech Integration in Neurotypical Children and Those on the Autism Spectrum.
Cuppini C, Ursino M, Magosso E, Ross LA, Foxe JJ, Molholm S. Cuppini C, et al. Front Hum Neurosci. 2017 Oct 30;11:518. doi: 10.3389/fnhum.2017.00518. eCollection 2017. Front Hum Neurosci. 2017. PMID: 29163099 Free PMC article.
The Ventriloquist Illusion as a Tool to Study Multisensory Processing: An Update.
Bruns P. Bruns P. Front Integr Neurosci. 2019 Sep 12;13:51. doi: 10.3389/fnint.2019.00051. eCollection 2019. Front Integr Neurosci. 2019. PMID: 31572136 Free PMC article.
A neural network model can explain ventriloquism aftereffect and its generalization across sound frequencies.
Magosso E, Cona F, Ursino M. Magosso E, et al. Biomed Res Int. 2013;2013:475427. doi: 10.1155/2013/475427. Epub 2013 Oct 21. Biomed Res Int. 2013. PMID: 24228250 Free PMC article.
Processing of audiovisually congruent and incongruent speech in school-age children with a history of specific language impairment: a behavioral and event-related potentials study.
Kaganovich N, Schumaker J, Macias D, Gustafson D. Kaganovich N, et al. Dev Sci. 2015 Sep;18(5):751-70. doi: 10.1111/desc.12263. Epub 2014 Nov 29. Dev Sci. 2015. PMID: 25440407 Free PMC article.
Accumulation and decay of visual capture and the ventriloquism aftereffect caused by brief audio-visual disparities.
Bosen AK, Fleming JT, Allen PD, O'Neill WE, Paige GD. Bosen AK, et al. Exp Brain Res. 2017 Feb;235(2):585-595. doi: 10.1007/s00221-016-4820-4. Epub 2016 Nov 11. Exp Brain Res. 2017. PMID: 27837258 Free PMC article.

See all "Cited by" articles

References

1. Stein BE, Meredith MA (1993) The merging of the senses. Cambridge, MA: The MIT Press.
1. Bertelson P, Radeau M (1981) Cross-modal bias and perceptual fusion with auditory-visual spatial discordance. Percept Psychophys 29: 578–584. - PubMed
1. Radeau M, Bertelson P (1987) Auditory-Visual Interaction and the Timing of Inputs - Thomas (1941) Revisited. Psychological Research-Psychologische Forschung 49: 17–22. - PubMed
1. Radeau M, Bertelson P (1977) Adaptation to auditory-visual discordance and ventriloquism in semirealistic situations. Percept Psychophys 22: 137–146.
1. Welch RB, Warren DH (1980) Immediate perceptual response to intersensory discrepancy. Psychol Bull 88: 638–667. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A neural network model of ventriloquism effect and aftereffect

Affiliation

A neural network model of ventriloquism effect and aftereffect

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources