Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar:208:104341.
doi: 10.1016/j.cognition.2020.104341. Epub 2020 Jun 23.

Computational insights into human perceptual expertise for familiar and unfamiliar face recognition

Affiliations

Computational insights into human perceptual expertise for familiar and unfamiliar face recognition

Nicholas M Blauch et al. Cognition. 2021 Mar.

Abstract

Humans are generally thought to be experts at face recognition, and yet identity perception for unfamiliar faces is surprisingly poor compared to that for familiar faces. Prior theoretical work has argued that unfamiliar face identity perception suffers because the majority of identity-invariant visual variability is idiosyncratic to each identity, and thus, each face identity must be learned essentially from scratch. Using a high-performing deep convolutional neural network, we evaluate this claim by examining the effects of visual experience in untrained, object-expert and face-expert networks. We found that only face training led to substantial generalization in an identity verification task of novel unfamiliar identities. Moreover, generalization increased with the number of previously learned identities, highlighting the generality of identity-invariant information in face images. To better understand how familiarity builds upon generic face representations, we simulated familiarization with face identities by fine-tuning the network on images of the previously unfamiliar identities. Familiarization produced a sharp boost in verification, but only approached ceiling performance in the networks that were highly trained on faces. Moreover, in these face-expert networks, the sharp familiarity benefit was seen only at the identity-based output probability layer, and did not depend on changes to perceptual representations; rather, familiarity effects required learning only at the level of identity readout from a fixed expert representation. Our results thus reconcile the existence of a large familiar face advantage with claims that both familiar and unfamiliar face identity processing depend on shared expert perceptual representations.

Keywords: Deep convolutional neural network; Expertise; Face recognition; Familiarity; Invariance.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Verifying the identity of images of unfamiliar faces can be much harder than doing so for familiar faces. Most American readers will be familiar with the American celebrities on the right, but not with the Australian celebrities on the left. The face verification task requires the participant to determine whether pairs of images are of the same or a different identity. The top row shows difficult identity matches, and the bottom row shows difficult identity non-matches.
Fig. 2.
Fig. 2.
Architecture of the VGG-16 deep convolutional neural network (DCNN) (Simonyan & Zisserman, 2015) (schematic produced using code at https://doi.org/10.5281/zenodo.2526396). The DCNN takes a 224 × 224 × 3 input image and transforms it in a hierarchical fashion to a set of output class probabilities. Convolutional blocks (conv1, conv2, …, conv5) contain 2 or 3 convolutional layers which do not downsample the spatial resolution of their input (i.e., stride of 1), followed by pooling. The convolutional blocks are following by three fully-connected layers, the last of which contains 1 unit per known identity. The activations in the last layer fc8 are transformed with the softmax function to a probability distribution, represented in layer prob. Operations are colored as following: convolution in light yellow, pooling in dark orange, linear transformation in light purple, rectification in dark yellow following convolution or purple following linear transformation, and finally softmax in dark purple. Arrows indicate the flow of information. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3.
Fig. 3.
The model of (Kramer et al., 2018) underestimates human-level unfamiliar face recognition and is outperformed by a face-trained, but not an object-trained DCNN. In A., we estimated d′ from their distance measurements. In B., we converted their reported hit and false alarm rates to d′, which notably were not reported for the PCA model but only a PCA + LDA model fit on a separate set of identities from the ones tested. In C., we constructed an Active Appearance Model (AAM) similar to that used by (Kramer et al., 2018) but with fully-automated labeling of landmarks, and compared its performance on face verification of deep-funneled images of Labeled Faces in the Wild with a deep convolutional neural network model trained on faces (face-DCNN), or objects (object-DCNN), before and after familiarization.
Fig. 4.
Fig. 4.
Familiarizing three DCNNs with a novel set of identities. Networks pre-trained on faces, objects, or nothing (randomly initialized) were fine-tuned on novel identities in Labeled Faces in the Wild. In A., we plot performance of each network throughout training on training and held-out testing images collapsed across all new identities. In B., we plot accuracy for each new identity separately, vs. the number of unique training examples for each identity, shown for a representative sample of epochs throughout the course of familiarization.
Fig. 5.
Fig. 5.
Familiar and unfamiliar face verification by DCNNs with different training distributions matched in total number of images. Cosine distance matrices were computed over images for each layer separately, before and after familiarization. Unfamiliar representations were computed immediately following pretraining, and familiar representations were computed for the same images after 50 epochs of fine-tuning on a separate training set of images for the novel identities. d′ was estimated with an ROC-based analysis (see Methods).
Fig. 6.
Fig. 6.
Distance matrices of perceptual and identity representations in a face-trained DCNN before and after familiarization. Cosine distances were computed over images, with images sorted by identity (10 images per identity). The top row shows distances for the highest-level perceptual representations (fc7), and the bottom row shows distances for the softmax-probability identity representations (prob). The left-most column shows unfamiliar distance matrices, the middle column shows familiarized distance matrices, and the right column takes the difference (familiar – unfamiliar).
Fig. 7.
Fig. 7.
Unfamiliar and familiar face verification measured in networks varying in the extent of face experience prior to familiarization. A fraction of 0.01, 0.1, or 1.0 of the total identities were used, and corresponding results for unfamiliar and familiarized face recognition are plotted as a function of layer (A) and fraction of identities (B) for high-level perceptual and identity representations. In B, a log10 X-scale is plotted against a linear Y-scale.
Fig. 8.
Fig. 8.
The effect of experience with familiarized identites on familiar face verification, depending on the point in the network where fine-tuning begins: conv1, where the entire network is adapated; fc6, the type of fine-tuning used in the LFW experiment; and fc8, where only the final classifier layer is adapted. Fine-tuning on 10 identities is shown in A., and on 100 identities in B. Within both A. and B., columns vary the domain of pretraining (faces, objects), and rows vary the layer from which verification is computed (prob or fc7).
Fig. 9.
Fig. 9.
Comparing human and DCNN unfamiliar verification performance on a challenging set of face image pairs from a dataset of Australian local celebrities. Unfamiliar verification performance of VGG-16 DCNN pretrained on objects or faces is shown on the left. Humans performed the same verification task 4 times, and performance is plotted for both the first totally unfamiliar session, and each of the three repeat sessions.
Fig. 10.
Fig. 10.
Verification using a cognitive rule that flexibly determines whether to use perceptual or identity representations. Results are shown for the face-pre-trained network before and after familiarization, evaluated on the same images as in the behavioral experiment. We plot area under the ROC curve (ROC-AUC) here instead of d′, as d′ = ∞ for the familiarized network on this small set of image pairs.

References

    1. Abudarham N, Shkiller L, & Yovel G (2019). Critical features for face recognition. Cognition, 182, 73–83. URL 10.1016/j.cognition.2018.09.002. - DOI - PubMed
    1. Abudarham N, & Yovel G (Feb 2016). Reverse engineering the face space: Discovering the critical features for face identification. Journal of Vision, 16(3), 40. URL 10.1167/16.3.40. - DOI - PubMed
    1. Antonakos E, Alabort-I-Medina J, Tzimiropoulos G, & Zafeiriou SP (2015). Feature-based lucas-kanade and active appearance models. IEEE Transactions on Image Processing, 24(9), 2617–2632. URL www.menpo.org. - PubMed
    1. Behrmann M, & Avidan G (2005). Congenital prosopagnosia: Face-blind from birth. Trends in Cognitive Sciences, 9(4), 180–187. - PubMed
    1. Bothwell RK, Brigham JC, & Malpass RS (Mar 1989). Cross-racial identification. Personality and Social Psychology Bulletin, 15(1), 19–25. URL 10.1177/0146167289151002. - DOI

Publication types

LinkOut - more resources