Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 27;109(48):E3314-23.
doi: 10.1073/pnas.1214269109. Epub 2012 Nov 12.

Looking just below the eyes is optimal across face recognition tasks

Affiliations

Looking just below the eyes is optimal across face recognition tasks

Matthew F Peterson et al. Proc Natl Acad Sci U S A. .

Abstract

When viewing a human face, people often look toward the eyes. Maintaining good eye contact carries significant social value and allows for the extraction of information about gaze direction. When identifying faces, humans also look toward the eyes, but it is unclear whether this behavior is solely a byproduct of the socially important eye movement behavior or whether it has functional importance in basic perceptual tasks. Here, we propose that gaze behavior while determining a person's identity, emotional state, or gender can be explained as an adaptive brain strategy to learn eye movement plans that optimize performance in these evolutionarily important perceptual tasks. We show that humans move their eyes to locations that maximize perceptual performance determining the identity, gender, and emotional state of a face. These optimal fixation points, which differ moderately across tasks, are predicted correctly by a Bayesian ideal observer that integrates information optimally across the face but is constrained by the decrease in resolution and sensitivity from the fovea toward the visual periphery (foveated ideal observer). Neither a model that disregards the foveated nature of the visual system and makes fixations on the local region with maximal information, nor a model that makes center-of-gravity fixations correctly predict human eye movements. Extension of the foveated ideal observer framework to a large database of real-world faces shows that the optimality of these strategies generalizes across the population. These results suggest that the human visual system optimizes face recognition performance through guidance of eye movements not only toward but, more precisely, just below the eyes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Task time line. The free eye movement condition allowed observers to make a saccade from initial fixations surrounding the image into the centrally presented face image with time for one fixation. The forced fixation task was identical, except the possible initial fixations were situated along the vertical midline and eye movements were prohibited.
Fig. 2.
Fig. 2.
Eye movement behavior. (A) Representative fixations from 3 observers for the free eye movement condition. Each red dot indicates a single saccade of the 500 total fixations per observer, whereas the black dot represents the mean landing point across all saccades. (B) Each green dot indicates the mean landing point for 1 observer, whereas the white dot is the mean landing point across the 20 observers. (C) Eye movement behavior for observers identifying full-color, noise-free images mirrors the results from the main identification task.
Fig. 3.
Fig. 3.
Forced fixation performance and foveated ideal face discriminator performance. Black dots are the average performance in the forced fixation condition across observers (error bars represent 1 SEM). The blue rectangles represent the saccade distribution at the group level, centered at the mean of the landing point of the first saccade with a width of 1 SD. Humans fixated between the eyes and nose but closer to the eyes. The red line indicates the model predictions of the FIO.
Fig. 4.
Fig. 4.
ROI ideal observer and FIO methodology. (A) ROI ideal observer, a technique for localizing and quantifying information content, is an adaptation of classic white noise ideal observer theory. Small regions of the stimulus are extracted and embedded in white Gaussian noise. The likelihoods for the presence of each possible stimulus are computed in a Bayesian manner, and the maximum likelihood is taken as the decision. A single signal contrast is chosen and held constant across regions. Thus, the performance of the ideal observer for each region is a measurement of the total task-relevant information content. (B) Flow chart for the FIO simulations. For any given fixation (here, center of the image), the image is divided into spatial bins, each with its own contrast sensitivity function (CSF) depending on retinal eccentricity and direction from fixation. The image is filtered in the frequency domain and then reassembled in the spatial domain, resulting in a spatially variant filtered image. FFT, fast Fourier transform; IFFT, inverse FFT.
Fig. 5.
Fig. 5.
ROI and FIO predictions. (A) ROI ideal observer shows heavy concentrations of information in the eye region, with smaller peaks around the nose tip and mouth. Overlaid are the mean saccade landing points for each individual (in green) and the group (in white). Saccades were not directed toward the most information regions. (B) FIO predictions show a peak in the center of the face just below the eyes, where information is optimally integrated across the visual field. The overlaid saccade distributions show a strong tendency for observers to target regions of maximal information gain.
Fig. 6.
Fig. 6.
NSA. (A) FIO results along the vertical midline for 100 groups of 10 faces each are shown, with dark gray representing the mean performance across groups plus or minus 1 SEM. Light gray represents the SD. (B and C) ROI and FIO results, respectively, show a strong correspondence to the results using images from the human study.
Fig. 7.
Fig. 7.
Happy vs. neutral behavioral and ideal observer results. (A) Humans move their saccades downward toward the nose tip. Human saccade distribution means for the identification and emotion tasks are indicated by the red and black arrows, respectively. (B) ROI ideal observer shows a heavy concentration of information in the mouth, where the smile is the most informative cue. However, humans do not fixate this area. (C) Two-dimensional FIO results show a peak toward the nose tip, where the (still heavy) concentration of information in the eyes can be optimally combined with the higher visibility information from the mouth.
Fig. 8.
Fig. 8.
Evaluation of central bias strategies and summary of results. (A) Strategy that targets the geometric centers for either the visible face area (purple), the cropping black box (orange), or the uncropped entire head region (cyan) cannot account for human eye movement results (blue, identification; red, emotion; green, gender). (B) New condition, which moves the center drastically downward on the face (orange), yields nearly identical results (black) to the original Short condition (white) while providing even poorer eye movement predictions. (C) Compilation of eye movement results and corresponding model predictions for all conditions. Inspection shows that the FIO is the only model that correctly predicts human fixation locations and tracks the systematic modulation of behavior with task.
Fig. P1.
Fig. P1.
ROI and FIO performance predictions. (A) For identification, emotion recognition, and gender (gender) discrimination, the ROI ideal observer shows heavy concentrations of information in the eye region, with smaller peaks around the nose tip and mouth. The happy vs. neutral discrimination condition yields images with information most heavily concentrated in the mouth region, even though the eyes are still useful. The mean saccade landing points for each individual (green) and the group (white) are overlaid. (B) FIO predictions show a performance peak in the center of the face just below the eyes for the first three tasks, whereas the happy vs. neutral condition peaks significantly lower. The overlaid saccade distributions show a strong tendency for observers to fixate regions between the peaks of local information concentration, consistent with an optimal strategy that maximizes information gain for a system that integrates information across the visual field.

Similar articles

Cited by

References

    1. Zhao W, Chellappa R, Phillips PJ, Rosenfeld A. Face recognition: A literature survey. Association for Computing Machinery: Computer Surveys. 2003;35(4):399–458.
    1. Diamond R, Carey S. Why faces are and are not special: An effect of expertise. J Exp Psychol Gen. 1986;115(2):107–117. - PubMed
    1. Hsiao JH, Cottrell G. Two fixations suffice in face recognition. Psychol Sci. 2008;19(10):998–1006. - PMC - PubMed
    1. Kanwisher N, McDermott J, Chun MM. The fusiform face area: A module in human extrastriate cortex specialized for face perception. J Neurosci. 1997;17(11):4302–4311. - PMC - PubMed
    1. Haxby JV, Hoffman EA, Gobbini MI. The distributed human neural system for face perception. Trends Cogn Sci. 2000;4(6):223–233. - PubMed

Publication types

LinkOut - more resources