Review

. 2021 Sep 15:7:543-570.

doi: 10.1146/annurev-vision-093019-111701. Epub 2021 Aug 4.

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Alice J O'Toole¹, Carlos D Castillo²

Affiliations

¹ School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA; email: otoole@utdallas.edu.
² Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA; email: carlosdc@jhu.edu.

PMID: 34348035
PMCID: PMC8721510
DOI: 10.1146/annurev-vision-093019-111701

Review

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Alice J O'Toole et al. Annu Rev Vis Sci. 2021.

. 2021 Sep 15:7:543-570.

doi: 10.1146/annurev-vision-093019-111701. Epub 2021 Aug 4.

Authors

Alice J O'Toole¹, Carlos D Castillo²

Affiliations

¹ School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA; email: otoole@utdallas.edu.
² Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA; email: carlosdc@jhu.edu.

PMID: 34348035
PMCID: PMC8721510
DOI: 10.1146/annurev-vision-093019-111701

Abstract

Deep learning models currently achieve human levels of performance on real-world face recognition tasks. We review scientific progress in understanding human face processing using computational approaches based on deep learning. This review is organized around three fundamental advances. First, deep networks trained for face identification generate a representation that retains structured information about the face (e.g., identity, demographics, appearance, social traits, expression) and the input image (e.g., viewpoint, illumination). This forces us to rethink the universe of possible solutions to the problem of inverse optics in vision. Second, deep learning models indicate that high-level visual representations of faces cannot be understood in terms of interpretable features. This has implications for understanding neural tuning and population coding in the high-level visual cortex. Third, learning in deep networks is a multistep process that forces theoretical consideration of diverse categories of learning that can overlap, accumulate over time, and interact. Diverse learning types are needed to model the development of human face processing skills, cross-race effects, and familiarity with individual faces.

Keywords: cross-race effects; deep convolutional networks; face recognition; face space; facial features; human learning; machine learning.

PubMed Disclaimer

Figures

**Figure 1**
The progress of computer-based face recognition systems can be tracked by their ability to recognize faces with increasing levels of image and appearance variability. In 2006, highly controlled, cropped face images with moderate variability, such as the images of the same person shown, were challenging (images adapted with permission from Sim et al. 2002). In 2012, algorithms could tackle moderate image and appearance variability (the top 4 images are extreme examples adapted with permission from Huang et al. 2012; the bottom two images adapted with permission from Phillips et al. 2011). By 2018, deep convolutional neural networks (DCNNs) began to tackle wide variation in image and appearance, (images adapted with permission from the database in Maze et al. 2018). In the 2012 and 2018 images, all side-by side images show the same person except the bottom pair of 2018 panels.

**Figure 2**
Visualization of the top-level deep convolutional neural network (DCNN) similarity space for all images from Hill et al. (2019). (a–f) Points are colored according to different variables. Grey polygonal borders are for illustration purposes only and show the convex hull of all images of each identity. These convex hulls are expanded by a margin for visibility. The network separates identities accurately. In panels a and d, the space is divided into male and female sections. In panels b and e, illumination conditions subdivide within identity groupings. In panels c and f, the viewpoint varies sequentially within illumination clusters. Dotted-line boxes in panels a–c show areas enlarged in panels d–g. Figure adapted with permission from Hill et al. (2019).

**Figure 3**
Illustration of the separation of the task-relevant information into subspaces for an identity-trained deep convolutional neural network (DCNN). Each plot shows the similarity (cosine) between principal components (PCs) of the face space and directional vectors in the space that are diagnostic of identity (*top*), gender (*middle*), and viewpoint (*bottom*). Figure adapted with permission from Parde et al. (2021).

**Figure 4**
(a) A model with too few parameters fails to fit the data. (b) The ideal-fit model fits with a small number of parameters and has generative power that supports interpolation and extrapolation. (c) An overfit function can model noise in the training data. (d) An overparameterized model generalizes well to new stimuli within the scope of the training samples. Figure adapted with permission from Hasson et al. (2020).

See this image and copyright information in PMC

References

1. Abadi M, Barham P, Chen J, Chen Z, Davis A, et al. 2016. Tensorflow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–83. Berkeley, CA: USENIX
1. Abudarham N, Shkiller L, Yovel G. 2019. Critical features for face recognition. Cognition 182:73–83 - PubMed
1. Abudarham N, Yovel G. 2020. Face recognition depends on specialized mechanisms tuned to view-invariant facial features: insights from deep neural networks optimized for face or object recognition. bioRxiv 2020.01.01.890277. 10.1101/2020.01.01.890277 - DOI - PubMed
1. Azevedo FA, Carvalho LR, Grinberg LT, Farfel JM, Ferretti RE, et al. 2009. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. J. Comp. Neurol 513(5):532–41 - PubMed
1. Barlow HB. 1972. Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1(4):371–94 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 EY029692/EY/NEI NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Affiliations

Face Recognition by Humans and Machines: Three Fundamental Advances from Deep Learning

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources