Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May;12(5):182-6.
doi: 10.1016/j.tics.2008.02.003. Epub 2008 Apr 7.

Object-based auditory and visual attention

Affiliations

Object-based auditory and visual attention

Barbara G Shinn-Cunningham. Trends Cogn Sci. 2008 May.

Abstract

Theories of visual attention argue that attention operates on perceptual objects, and thus that interactions between object formation and selective attention determine how competing sources interfere with perception. In auditory perception, theories of attention are less mature and no comprehensive framework exists to explain how attention influences perceptual abilities. However, the same principles that govern visual perception can explain many seemingly disparate auditory phenomena. In particular, many recent studies of 'informational masking' can be explained by failures of either auditory object formation or auditory object selection. This similarity suggests that the same neural mechanisms control attention and influence perception across different sensory modalities.

PubMed Disclaimer

Figures

Fig 1
Fig 1
Conceptual model relating auditory object formation and its interactions with bottom-up salience and top-down attention, where arrow width denotes the strength of a signal or a connection. 1) Short-term segments initially form based on local spectro-temporal grouping cues [12,13]. 2) Competition first arises between short-term segments. Some segments may be inherently more salient than others (e.g., because of their intensity or distinctiveness) [41,49], which biases the inter-segment competition. 3) Top-down attention and 4) streaming (across-time linkage based on bottom-up object continuity) help modulate the competition, biasing it to favor objects with desirable features and to maintain attention on the object already in the foreground [23,50]. 5) As a result, one object is emphasized at the expense of others in the scene [44].
Fig 2
Fig 2
Visual analogies of failed object formation. Left: the general similarity of the features and elements of the image make it difficult to segregate words, so viewers are likely to perceive the mixture as a connected mass that fails to represent any of the individual words. When this occurs, it takes extra time and cognitive effort to understand the words. Middle: when color is used to differentiate the letters, like-colored letters tend to group; however, if the letters making up the target word fail to group together and the target is not perceived as one unified object (direct attention to the middle of the image), analyzing the target word still requires extra effort. Right; understanding is clear when the letters making up each word group together and each word forms automatically, resulting in an enhanced ability to selectively attend to each in turn.
Fig 3
Fig 3
Illustration of failure of auditory streaming. Two brothers address their mother simultaneously. Although the local spectro-temporal structure of the speech signals supports formation of words (local objects), the words are not properly sorted into streams, and she does not properly perceive either message.
Fig 4
Fig 4
Visual analogy illustrating how object selection can be driven by bottom-up salience. In this example, objects form based primarily on the spatial proximity of the letters within, compared to across, words in the image. Thus, object formation is not at issue; letters form automatically into meaningful words. The phrase “bottom-up” pops out because it different from and more salient than the other words: attention is automatically drawn to this phrase even in the absence of any top-down desire to attend to it. However, if a viewer is specifically told to look at the bottom left corner of the image, the phrase “top-down” becomes the focus of attention. In order for volitional attention to override bottom-up salience and select a desired target, the observer must be told some feature (here, spatial location) that differentiates the target from the competing objects.
Fig Box 1
Fig Box 1
Visual analogy illustrating glimpsing and phonemic restoration. A) Mixture of messages. Even though one message obstructs a portion of the other, the meaning of both messages is clear. Moreover, you undoubtedly perceive the full characters “the” to be in the visual scene, even though the actual stimulus is ambiguous and could contain only portions of letters consistent with that interpretation. Your experience and knowledge allow you to perceptually fill in the hidden pieces based on what is most likely, given the sensory evidence you perceive as well as your knowledge of letters, words, and meaning. B) Center portion of the perceived background message in the mixture. C) A visual object that is unlikely, but physically consistent with the center of the background message.

References

    1. Simons DJ, Rensink RA. Change blindness: past, present, and future. Trends Cogn Sci. 2005;9:16–20. - PubMed
    1. Desimone R, Duncan J. Neural mechanisms of selective visual attention. Annu Rev Neurosci. 1995;18:193–222. - PubMed
    1. Serences JT, et al. Preparatory activity in visual cortex indexes distractor suppression during covert spatial orienting. J Neurophysiol. 2004;92:3538–3545. - PubMed
    1. Feldman J. What is a visual object? Trends Cogn Sci. 2003;7:252–256. - PubMed
    1. Whalen DH, Liberman AM. Limits on phonetic integration in duplex perception. Percept Psychophys. 1996;58:857–870. - PubMed

Publication types

MeSH terms