Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Sep 15:7:543-570.
doi: 10.1146/annurev-vision-093019-111701. Epub 2021 Aug 4.

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Affiliations
Review

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Alice J O'Toole et al. Annu Rev Vis Sci. .

Abstract

Deep learning models currently achieve human levels of performance on real-world face recognition tasks. We review scientific progress in understanding human face processing using computational approaches based on deep learning. This review is organized around three fundamental advances. First, deep networks trained for face identification generate a representation that retains structured information about the face (e.g., identity, demographics, appearance, social traits, expression) and the input image (e.g., viewpoint, illumination). This forces us to rethink the universe of possible solutions to the problem of inverse optics in vision. Second, deep learning models indicate that high-level visual representations of faces cannot be understood in terms of interpretable features. This has implications for understanding neural tuning and population coding in the high-level visual cortex. Third, learning in deep networks is a multistep process that forces theoretical consideration of diverse categories of learning that can overlap, accumulate over time, and interact. Diverse learning types are needed to model the development of human face processing skills, cross-race effects, and familiarity with individual faces.

Keywords: cross-race effects; deep convolutional networks; face recognition; face space; facial features; human learning; machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The progress of computer-based face recognition systems can be tracked by their ability to recognize faces with increasing levels of image and appearance variability. In 2006, highly controlled, cropped face images with moderate variability, such as the images of the same person shown, were challenging (images adapted with permission from Sim et al. 2002). In 2012, algorithms could tackle moderate image and appearance variability (the top 4 images are extreme examples adapted with permission from Huang et al. 2012; the bottom two images adapted with permission from Phillips et al. 2011). By 2018, deep convolutional neural networks (DCNNs) began to tackle wide variation in image and appearance, (images adapted with permission from the database in Maze et al. 2018). In the 2012 and 2018 images, all side-by side images show the same person except the bottom pair of 2018 panels.
Figure 2
Figure 2
Visualization of the top-level deep convolutional neural network (DCNN) similarity space for all images from Hill et al. (2019). (af) Points are colored according to different variables. Grey polygonal borders are for illustration purposes only and show the convex hull of all images of each identity. These convex hulls are expanded by a margin for visibility. The network separates identities accurately. In panels a and d, the space is divided into male and female sections. In panels b and e, illumination conditions subdivide within identity groupings. In panels c and f, the viewpoint varies sequentially within illumination clusters. Dotted-line boxes in panels ac show areas enlarged in panels dg. Figure adapted with permission from Hill et al. (2019).
Figure 3
Figure 3
Illustration of the separation of the task-relevant information into subspaces for an identity-trained deep convolutional neural network (DCNN). Each plot shows the similarity (cosine) between principal components (PCs) of the face space and directional vectors in the space that are diagnostic of identity (top), gender (middle), and viewpoint (bottom). Figure adapted with permission from Parde et al. (2021).
Figure 4
Figure 4
(a) A model with too few parameters fails to fit the data. (b) The ideal-fit model fits with a small number of parameters and has generative power that supports interpolation and extrapolation. (c) An overfit function can model noise in the training data. (d) An overparameterized model generalizes well to new stimuli within the scope of the training samples. Figure adapted with permission from Hasson et al. (2020).

References

    1. Abadi M, Barham P, Chen J, Chen Z, Davis A, et al. 2016. Tensorflow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–83. Berkeley, CA: USENIX
    1. Abudarham N, Shkiller L, Yovel G. 2019. Critical features for face recognition. Cognition 182:73–83 - PubMed
    1. Abudarham N, Yovel G. 2020. Face recognition depends on specialized mechanisms tuned to view-invariant facial features: insights from deep neural networks optimized for face or object recognition. bioRxiv 2020.01.01.890277. 10.1101/2020.01.01.890277 - DOI - PubMed
    1. Azevedo FA, Carvalho LR, Grinberg LT, Farfel JM, Ferretti RE, et al. 2009. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. J. Comp. Neurol 513(5):532–41 - PubMed
    1. Barlow HB. 1972. Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1(4):371–94 - PubMed

Publication types

LinkOut - more resources