Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov 22:5:53.
doi: 10.3389/fncom.2011.00053. eCollection 2011.

A Biologically Plausible Transform for Visual Recognition that is Invariant to Translation, Scale, and Rotation

Affiliations

A Biologically Plausible Transform for Visual Recognition that is Invariant to Translation, Scale, and Rotation

Pavel Sountsov et al. Front Comput Neurosci. .

Abstract

Visual object recognition occurs easily despite differences in position, size, and rotation of the object, but the neural mechanisms responsible for this invariance are not known. We have found a set of transforms that achieve invariance in a neurally plausible way. We find that a transform based on local spatial frequency analysis of oriented segments and on logarithmic mapping, when applied twice in an iterative fashion, produces an output image that is unique to the object and that remains constant as the input image is shifted, scaled, or rotated.

Keywords: biological classifier; cortico-striatal; hierarchical; hybrid model; reinforcement; unsupervised.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The two-stage transformation. (A,B) In the first step of the first stage, edge detection is performed, illustrated for an orientation of 45° in (B) (red, positive values; blue, negative). (C,D) The second step of the transform is a spatial interval detector looking for edges separated by interval I at the same angle as the interval detector. To achieve this, the image is shifted (C), and the pixel values are multiplied, with negative values set to zero (D). (E) The image in (D) is summed over all positions to yield a single point in the log interval vs. orientation map (and similarly for other orientations and intervals; orientation range 0–180; interval range 100–700 pixels). Color code is at far right; this is linear with dark blue as zero. (F) In the second stage, the same transform is applied again, yielding a map whose coordinates are log I′ and θ′ (defined relative to the axes of the stage 1 output; interval range 15–85 pixels). Color code is at right; this is linear with dark blue as zero.
Figure 2
Figure 2
The two-stage transform produces an output invariant to translation, scale, and rotation. First column: the letter W is translated (second row), scaled (third row), or rotated (fourth row). Second column: first-stage output. Third column: second-stage output. Axes and color code are as in Figure 1E,F respectively.
Figure 3
Figure 3
The two-stage transform provides a unique description of the object. The object in the left panel (line and dot) was varied by moving the dot exhaustively to all positions in the panel. For each position, the two-stage transform was applied. The difference in output from the pattern at left was computed as a Euclidian distance and color coded (red, most different; blue, most similar). The only similar output (blue region at bottom left) is produced by a rotation of the original image.
Figure 4
Figure 4
Output of the two-stage transform is sufficient for object recognition. (A) First- and second-stage transforms for various letters. Axes and color code as in Figure 1E,F. (B) To test recognition, the transform of nine rotated and scaled versions of each letter was compared to the transform of the parent letter. The distances between the 26 parent letters (black letters) and 234 rotated and scaled versions (red dots) are approximately rendered in two dimensions by multidimensional scaling (see Materials and Methods).
Figure 5
Figure 5
Letter identification is resistant to perturbations of the image. Different perturbations were gradually applied to the letter A until the resulting image was not correctly recognized by the linear classifier. Panels depict the maximum amount of perturbation before misidentification occurred. (A) Distortion of the letter shape. Insets illustrate the effect of the distortion on an outlined square. Top: horizontal shear. Bottom: foreshortening due to perspective. (B) Texture superposition. The images were generated by linearly mixing the source image and an image composed of horizontal (top) or randomly oriented (bottom) bars. (C) Whole image manipulation. Top: White noise was added to every pixel of the source image, followed by normalization of the image pixel intensity to span the range 0–1. The blending of the source image and the noise was varied. Bottom: The black background of the source image was replaced by an image of a natural scene with different levels of mean intensity.

Similar articles

Cited by

References

    1. Adams D. L., Horton J. C. (2003). A precise retinotopic map of primate striate cortex generated from the representation of angioscotomas. J. Neurosci. 23, 3771–3789 - PMC - PubMed
    1. Andrews B. W., Pollen D. A. (1979). Relationship between spatial frequency selectivity and receptive field profile of simple cells. J. Physiol. 287, 163–176 - PMC - PubMed
    1. Arditi A. R., Anderson P. A., Movshon J. A. (1981). Monocular and binocular detection of moving sinusoidal gratings. Vision Res. 21, 329–33610.1016/0042-6989(81)90160-7 - DOI - PubMed
    1. Bonhoeffer T., Grinvald A. (1993). The layout of iso-orientation domains in area 18 of cat visual cortex: optical imaging reveals a pinwheel-like organization. J. Neurosci. 13, 4157–4180 - PMC - PubMed
    1. Buschman T. J., Miller E. K. (2009). Serial, covert shifts of attention during visual search are reflected by the frontal eye fields and correlated with population oscillations. Neuron 63, 386–39610.1016/j.neuron.2009.06.020 - DOI - PMC - PubMed

LinkOut - more resources