Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;15(6):5.
doi: 10.1167/15.6.5.

Crowding, grouping, and object recognition: A matter of appearance

Crowding, grouping, and object recognition: A matter of appearance

Michael H Herzog et al. J Vis. 2015.

Abstract

In crowding, the perception of a target strongly deteriorates when neighboring elements are presented. Crowding is usually assumed to have the following characteristics. (a) Crowding is determined only by nearby elements within a restricted region around the target (Bouma's law). (b) Increasing the number of flankers can only deteriorate performance. (c) Target-flanker interference is feature-specific. These characteristics are usually explained by pooling models, which are well in the spirit of classic models of object recognition. In this review, we summarize recent findings showing that crowding is not determined by the above characteristics, thus, challenging most models of crowding. We propose that the spatial configuration across the entire visual field determines crowding. Only when one understands how all elements of a visual scene group with each other, can one determine crowding strength. We put forward the hypothesis that appearance (i.e., how stimuli look) is a good predictor for crowding, because both crowding and appearance reflect the output of recurrent processing rather than interactions during the initial phase of visual processing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Basic pooling model. Elements (e.g., letters A, V, and E) activate input units that subsequently feed into a pooling unit. Because of the larger receptive field of the pooling unit, the features of the letters are jumbled. (B) Neurophysiology. Neurons in V1 are sensitive to simple features such as edges and lines. In higher visual areas, neurons are sensitive to more and more complex features, such as simple shapes in V4 and objects in IT. Receptive field sizes increase from lower visual areas to higher visual areas. (C) Hierarchical models of object recognition formalize the neurophysiological findings (see, e.g., Riesenhuber & Poggio, 1999). Stimulus processing starts with the analysis of very simple features (edges and lines) and proceeds to more and more complex visual representations (shapes). A hypothetical “square neuron” receives input from neurons tuned to angles, which in turn receive inputs from basic line detectors. Receptive field sizes increase as they integrate more and more information across the visual field. At each step in the hierarchy, only signals from the previous areas are combined. Responses in higher areas are fully determined by the input from lower areas. Information lost on early stages is irretrievably lost.
Figure 2
Figure 2
(A) Observers were asked to discriminate whether a rectangle was wider along the horizontal (x) or vertical (y) axis. We determined the threshold width for which 75% correct responses were obtained. When the rectangle was flanked by three squares on each side, performance strongly deteriorated compared to when presented alone. This is a classic crowding effect. (B) Next, we asked observers to discriminate whether a vernier was offset to the left or right (a). We determined the offset size for which 75% correct responses occurred (left bar and dashed line). Performance deteriorated (i.e., thresholds increased) when the vernier was surrounded by a square (b). This is another classic crowding effect. Surprisingly, vernier discrimination improved when we combined the two conditions. Performance improved gradually, with the more squares that were presented. Best performance occurred with 2 × 3 contextual squares. In this condition, the fixation dot is close to the leftmost square and the rightmost square is at 17.5° (i.e., well outside Bouma's window). (C) First, we repeated the basic conditions (a–c). Next, crowding was strong when we removed the horizontal lines of the flanking squares (d) or rotated the flanking squares by 90° (e). Data from (d) and (e) were collected in different experiments with different observers and are shown here together to ease presentation. In part (B), we adjusted square size individually to enhance effects. This explains the higher thresholds compared to (C). Modified from Manassi et al. (2013).
Figure 3
Figure 3
Crowding and uncrowding depend on many grouping cues (for demonstrations see Figure 5). (A) Good Gestalt. A vernier flanked by two lines of the same length yields high thresholds, that is, strong crowding (a). When the two lines are integrated in a rectangle, thresholds strongly decrease (b). Crossing the horizontal lines of the rectangle increases thresholds, similar to the single lines condition (c). Closing the rectangle by additional horizontal lines reduces crowding again (foveal vision: Sayim et al., ; peripheral vision: Manassi et al., 2012). The dashed line indicates performance for the unflanked vernier. (B) Pattern regularity. Thresholds for a red vernier flanked by single red lines (a) and 10 red lines (b) are high compared to the unflanked vernier condition (shown by the dashed line). When the flankers are green (c–d), thresholds are much lower. A grating with alternating red and green lines leads to high thresholds (e). The red (f) and green (g) parts of the alternating grating themselves crowd very little. Only when parts of the alternating grating are combined, do they form a pattern that leads to strong crowding (adapted from Manassi et al., 2012). (C) Spacing regularity. Observers discriminated the orientation of a central letter T. Threshold elevation is high when the spacing between flanking letters is small and regular (tight condition). Increasing the spacing between the target and the innermost flankers decreased crowding (shifted condition). Crowding increased when we increased the spacing between the remaining flankers creating a regular pattern (wide condition). Adding more flankers in the gaps between the flankers (added condition) decreased crowding again (modified from Saarela et al., 2010). (D) Contour integration. Gabor orientation discrimination is weak when the central Gabor is surrounded by radially arranged flankers. When the flankers make up a smooth contour, crowding is reduced (adapted from Livne & Sagi, 2007).
Figure 4
Figure 4
Electrophysiological correlates of crowding. (A) A vernier target was presented in the fovea and flanked by arrays of short, equal length, or long lines. (B) Accuracy was highest for long, intermediate for short, and worst for equal length flankers in line with our grouping hypothesis. (C) Event-related potentials were recorded and global field power (GFP) computed, which reflects overall brain activity. The time axis is referenced to stimulus onset. The early visual response (the P1 component) reflects flanker length. P1 amplitudes are highest for long flankers, intermediate for equal length, and lowest for short flankers. Crowding strength is (inversely) reflected in the N1 component around 180–200 ms, which is highest for long, intermediate for short, and lowest for equal length flankers. Hence, it seems that it takes about 50–80 ms to transform the initial encoding into an object-based perceptual code. (D) Source localization in the N1 time window. The color scale reflects activation differences in the brain associated with crowding (difference between the long flanker and equal length flanker conditions). Particularly, sources in the lateral occipital and posterior temporal and parietal areas reflect crowding strength. Sources in the V1 do not contribute significantly. Modified from Chicherov et al. (2014).
Figure 5
Figure 5
For illustrative purposes, we have plotted various stimuli for the studies. Fixate the central cross and compare stimuli on the right to those on the left hand side.

Similar articles

Cited by

References

    1. Adelson E. H.(1993). Perceptual organization and the judgment of brightness. Science, 262(5142), 2042–2044. - PubMed
    1. Altmann C. F.,, Bülthoff H. H.,, Kourtzi Z.(2003). Perceptual organization of local elements into global shapes in the human visual cortex. Current Biology, 13(4), 342–349. - PubMed
    1. Anderson E. J.,, Dakin S. C.,, Schwarzkopf D. S.,, Rees G.,, Greenwood J. A.(2012). The neural correlates of crowding-induced changes in appearance. Current Biology, 22(13), 1199–1206, doi:http://dx.doi.org/10.1016/j.cub.2012.04.063. - DOI - PMC - PubMed
    1. Andriessen J.,, Bouma H.(1976). Eccentric vision: Adverse interactions between line segments. Vision Research, 16(1), 71–78, doi:http://dx.doi.org/10.1016/0042-6989(76)90078-X. - DOI - PubMed
    1. Balas B.,, Nakano L.,, Rosenholtz R.(2009). A summary-statistic representation in peripheral vision explains visual crowding. Journal of Vision, 9(12): 51–18, http://www.journalofvision.org/content/9/12/13, doi: 10.1167/9.12.13.[PubMed] [Article] - PMC - PubMed

Publication types

LinkOut - more resources