Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Feb:61:208-24.
doi: 10.1016/j.neubiorev.2015.11.002. Epub 2015 Nov 10.

The interactions of multisensory integration with endogenous and exogenous attention

Affiliations
Review

The interactions of multisensory integration with endogenous and exogenous attention

Xiaoyu Tang et al. Neurosci Biobehav Rev. 2016 Feb.

Abstract

Stimuli from multiple sensory organs can be integrated into a coherent representation through multiple phases of multisensory processing; this phenomenon is called multisensory integration. Multisensory integration can interact with attention. Here, we propose a framework in which attention modulates multisensory processing in both endogenous (goal-driven) and exogenous (stimulus-driven) ways. Moreover, multisensory integration exerts not only bottom-up but also top-down control over attention. Specifically, we propose the following: (1) endogenous attentional selectivity acts on multiple levels of multisensory processing to determine the extent to which simultaneous stimuli from different modalities can be integrated; (2) integrated multisensory events exert top-down control on attentional capture via multisensory search templates that are stored in the brain; (3) integrated multisensory events can capture attention efficiently, even in quite complex circumstances, due to their increased salience compared to unimodal events and can thus improve search accuracy; and (4) within a multisensory object, endogenous attention can spread from one modality to another in an exogenous manner.

Keywords: Attention; Attentional selectivity; Cross-modal spread of attention; Endogenous attention; Exogenous attention; Multisensory integration; Multisensory processing; Multisensory search templates.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Multisensory cortical regions (green) that are involved in multisensory integration. SPL = superior parietal lobule; IPS = intraparietal sulcus; STS = superior temporal sulcus; vlPFC = ventrolateral prefrontal cortex; PMC = premotor cortex. (b) Brain areas that are correlated with endogenous and exogenous attention. SPL = superior parietal lobule; IPS = intraparietal sulcus; FEF = frontal eye field; TPJ = temporal-parietal junction; VFC = ventral frontal cortex. Endogenous attention is associated with the dorsal attention network (blue), while exogenous attention is associated with the ventral attention network (red) (Fox et al., 2006). The dorsal attention network is bilateral. It is involved in voluntary (top-down) orienting and exhibits increases in activity after the presentation of cues that indicate where, when, or to what subjects should direct their attention. The ventral attention network is right lateralized. It is involved in involuntary (stimulus-driven) orienting and exhibits increases in activity after the presentation of salient targets, particularly when they appear in unexpected locations (Chica et al., 2013; Fox et al., 2006). (c) A framework for the interactions of multisensory integration with endogenous and exogenous attention. External stimuli from sensory organs can be integrated at multiple multisensory processing levels (Giard and Peronnet, 1999; Talsma and Woldorff, 2005). Multisensory integration is elicited as a consequence of the multiple phases of multisensory processing. Although these multisensory processes are thought to be automatic, attention influences not only unimodal processing but also multisensory processing in both an endogenous and exogenous manner. Endogenous attention can modulate multisensory processing via endogenous attentional selectivity [(1) Attentional selectivity]. This modulatory effect determines the extent to which simultaneously presented stimuli from different modalities can be integrated (see Figure 2 & Table 1). Furthermore, the integrated multisensory stimuli can be represented in multisensory templates that are stored in the brain. These multisensory templates exert top-down control over contingent attentional capture [(2) Integrated templates]. Due to their increased salience relative to unimodal cues, integrated multisensory cues can influence the exogenous orienting of spatial attention even under quite complex circumstances or can improve visual search efficiency by increasing target sensitivity [(3) Integrated cues]. Finally, endogenous attention can spread from one modality to another in an exogenous manner such that the stimuli of the unattended modality come to be “attended” [(4) Attentional spread].
Figure 2
Figure 2
Effects of endogenous attentional selectivity on multisensory processing. Endogenous attention can modulate multisensory performance improvements through spatial or modality selectivity. These two types of attentional selectivity can interact. Here, we list 4 examples of attentional selectivity. Example 1, attend to a modality and attend to a location: participants are asked to pay attention to a modality-specific stimuli at a specific location, e.g., they are asked to attend to the left visual stimuli while ignoring all of the auditory stimuli and the right visual stimuli. Consequently, multisensory integration at the attended location is more intensive than that at the unattended location (Fairhall and Macaluso, 2009). Although all of the auditory stimuli are ignored, the attention that is directed to the visual stimuli at the attended location can spread to auditory stimuli that are simultaneously presented at the attended location and even at the central location (Busse et al., 2005). Example 2, attend to multiple modalities and attend to a location: participants are asked to pay attention to stimuli from multiple modalities that are simultaneously presented at a specific location; for example, they are asked to attend to the left visual and auditory stimuli while ignoring all of the stimuli that are presented at the unattended location. Multisensory integration at the attended location has been found to be more intensive than that at the unattended location (Senkowski et al., 2005). Example 3, attend to a modality and attend to multiple locations: participants are asked to attend to stimuli in a specific modality at multiple locations; for example, they are asked to attend to visual stimuli while ignoring auditory stimuli regardless of the location of presentation. Consequently, responses to audiovisual stimuli are faster than those to visual stimuli, even though the participants are instructed to ignore the auditory stimuli (Santangelo et al., 2010). Example 4, attend to multiple modalities and attend to multiple locations: participants are asked to attend to stimuli in multiple modalities and at multiple locations; for example, they are asked to attend to both visual and auditory stimuli regardless of the location of presentation. Consequently, responses to audiovisual stimuli are faster than those to visual or auditory stimuli (Wu et al., 2012b). Notes: The stimuli illustrated here are only examples and do not depict the actual stimuli used in the previous studies. The tasks and results of the studies that are described in each example are listed in Table 1.
Figure 3
Figure 3
Multisensory templates exert top-down control on contingent attentional capture. (a) The experiment design of the task and trial sequence. The target and nontarget search displays of the two tasks are illustrated. One is the visual task, in which participants were asked to discriminate whether the red bar was vertical or horizontal. The other is the audiovisual task, in which participants were asked to discriminate between vertical and horizontal red bars when they were accompanied by a high-pitched tone (illustrated here as a 2000 Hz tone). Thus, the blue bars or low pitches (the nontarget search displays) were to be ignored. Each trial began with the cue array, which was composed of six elements, each consisting of four closely aligned dots. One element was a color singleton that matched the target color (illustrated here as “red”). The red singleton was presented randomly and with equal probability at one of the four lateral locations but never at the top or bottom locations. The visual target (the red vertical or horizontal bar) and the visual nontarget (the blue vertical or horizontal bar) were presented in the same manner as the cue. In the cued trials, the visual target or nontarget was presented at the ipsilateral (same) side as the cued singleton, while in the uncued trials, the visual target or nontarget was presented on the contralateral (opposite) side. (b) Behavioral results. Spatial cueing effects, which were calculated by subtracting the reaction time for the cued targets from that for the uncued targets, were found in both the visual and audiovisual tasks. More interestingly, the amplitude of the spatial cueing effect in the visual task was larger than that in the audiovisual task. (c) ERP results. The grand average ERP measured at the posterior electrodes PO7/8 contralateral and ipsilateral to the location of a target-color singleton cue. The difference waveforms that were obtained by subtracting the ipsilateral from the contralateral ERPs are illustrated separately for the visual (gray) and audiovisual tasks (green). The N2pc is marked and is an enhanced negativity that emerges approximately 200 ms after the onset of the target-color singleton cue. The results revealed that the amplitude of the N2pc component was larger in the visual task than in the audiovisual task. Adapted with permission from the corresponding author (Matusz and Eimer, 2013). Copyright © 2013 Society for Psychophysiological Research.
Figure 4
Figure 4
Effects of multisensory integration on exogenous attention. Multisensory integration acts on exogenous attention indirectly or directly. The indirect manner (a) can be observed when the exogenous cueing paradigm and an attentional/perceptual load are applied. As illustrated here, a non-predictive peripheral cue appears; this cue consists of the presentation of auditory stimuli from the two speakers located to the left or right of the monitor, the presentation of visual stimuli within the dashed squares on the monitor, or presentation of audiovisual stimuli. The target is presented in one of the corners of the display (as indicated by the dashed circles). In the no-load condition (a: left panel), the participants were asked to complete a target elevation discrimination task, i.e., to report whether the target appeared at the top or bottom of the screen. In the high-load condition (a: right panel), the participants were asked to complete not only the target elevation discrimination task but also a center RSVP task in which they were required to detect a digit among distractor letters. (b) Consequently, in the no-load condition, all types of cues can capture attention and elicit significant spatial cueing effects; however, in the high-load condition, only the audiovisual cue elicits a significant spatial cueing effect (Santangelo and Spence, 2007). The direct manner (c) can be observed when the visual search paradigm is applied. In this paradigm, visual search displays were presented in two dashed circles. The distractor lines changed orientation, and one of them changed into the target line, i.e., the vertical or horizontal line. The participants were asked to discriminate the orientation of the target, i.e., vertical or horizontal. The visual target orientation change was accompanied (AV) or not accompanied (V) by an irrelevant auditory stimulus. (d) Consequently (Van der Burg et al., 2011), the responses in the AV condition were found to be more accurate than those in the V condition, and the ERP amplitude elicited in the AV condition differed from the sum of those elicited by the unimodal auditory and visual stimuli (A+V). Further, the value of [AV−(A+V)] that was calculated during the 50–60 ms post-stimulus epoch was significantly correlated (p<.05) with the improvements in behavioral accuracy (AV vs. V). Adapted with permission from the corresponding authors (Santangelo and Spence, 2007) [Copyright 2007 by the American Psychological Association] and (Van der Burg et al., 2011) [© 2010 Elsevier Inc. All rights reserved.]
Figure 5
Figure 5
Temporal and spatial constraints on the cross-modal spread of attention. (a) Experimental design and types of stimuli. Visual stimuli were presented in the left or right peripheral square, whereas auditory stimuli were presented centrally. The visual target stimulus consisted of a checkerboard containing two dots. The attended locations are marked with the blue circle. The participants were asked to detect the target stimulus that was presented at the attended locations, which is illustrated here as the right side. There were four types of stimuli, including a visual stimulus only (V), a visual stimulus simultaneous with an auditory tone (VA), a visual stimulus with a tone delayed by 100 ms (V_100_A) and a visual stimulus with a tone delayed by 300 ms (V_300_A). (b) The behavioral and ERP results for conditions that correspond to different temporal gaps between the visual and auditory stimulus are illustrated. The behavioral simultaneity-judgment task revealed that the subjects were much more likely to judge the visual and auditory stimuli as occurring simultaneously when the two stimuli were presented simultaneously (VA) or with a temporal gap of 100 ms rather than with a temporal gap of 300 ms. Regarding the ERP results, the tone responses were extracted by subtracting the response to the visual-only stimulus from that for the combination of the visual and auditory stimuli in either the attended or unattended location. Differences in the extracted tone responses between the attended and unattended locations were found over fronto-central areas using a time window of 200–700 ms or 300–800 ms in the VA and V_100_A conditions but not in the V_300_A condition. Furthermore, the contra-laterality of the spreading-of-attention effect was observed only in the VA condition. Specifically, the mean amplitude of the extracted tone (VA-V) response over the fronto-central area during the 200–250 ms time window exhibited the interaction between the attended side and the presented location of the visual stimulus. Adapted with permission from the corresponding author (Donohue et al., 2011) [Copyright © 2011 the authors 0270-6474/11/317982- 09$15.00/0]
Figure 6
Figure 6
Processes of the cross-modal spread of attention within a multisensory object. As illustrated here, attention is focused on the visual modality to the right side by endogenous attentional selectivity. When the visual and auditory stimuli are presented simultaneously, they are processed in a multisensory manner. After low- and high-level multisensory processing, these stimuli are integrated into a coherent multisensory object. Within this multisensory object, attention can spread from the attended visual stimuli to the ignored auditory stimuli across modalities and locations, which occurs automatically. Moreover, the process of attentional spread across modalities and space involves dual mechanisms (Fiebelkorn et al., 2010). One mechanism is the stimulus-driven spread of attention, which is affected by spatial or temporal links between the auditory and visual stimuli (Donohue et al., 2011). The other mechanism is the representation-driven spread of attention, which is modulated by congruency when the multisensory stimuli must be checked in terms of matching or congruency (Zimmer et al., 2010a; Zimmer et al., 2010b). After these processing stages, the ignored auditory stimulus acquires attention from the attended visual stimuli. This entire process consists of endogenous attentional selectivity and the exogenous cross-modal spread of attention.

References

    1. Alsius A, Navarra J, Campbell R, Soto-Faraco S. Audiovisual integration of speech falters under high attention demands. Current Biology. 2005;15:839–843. - PubMed
    1. Alsius A, Navarra J, Soto-Faraco S. Attention to touch weakens audiovisual speech integration. Experimental Brain Research. 2007;183:399–404. - PubMed
    1. Anderson EJ, Rees G. Neural correlates of spatial orienting in the human superior colliculus. Journal of Neurophysiology. 2011;106:2273–2284. - PMC - PubMed
    1. Barrett DJK, Krumbholz K. Evidence for multisensory integration in the elicitation of prior entry by bimodal cues. Experimental Brain Research. 2012;222:11–20. - PMC - PubMed
    1. Barutchu A, Freestone DR, Innes-Brown H, Crewther DP, Crewther SG. Evidence for enhanced multisensory facilitation with stimulus relevance: an electrophysiological investigation. PLoS One. 2013;8:e52978. - PMC - PubMed

Publication types

LinkOut - more resources