Facial expression is retained in deep networks trained for face identification

Y Ivette Colón^{1

2}, Carlos D Castillo^{3

4}, Alice J O'Toole^{1

5}

Affiliations

¹ Behavioral and Brain Sciences, The University of Texas at Dallas, TX, USA.
² ycolon@wisc.edu.
³ University of Maryland Institute for Advanced Computer Studies, MD, USA.
⁴ carlosdc@jhu.edu.
⁵ otoole@utdallas.edu.

PMID: 33821927
PMCID: PMC8039571
DOI: 10.1167/jov.21.4.4

Facial expression is retained in deep networks trained for face identification

Y Ivette Colón et al. J Vis. 2021.

. 2021 Apr 1;21(4):4.

doi: 10.1167/jov.21.4.4.

Authors

Y Ivette Colón^{1

2}, Carlos D Castillo^{3

4}, Alice J O'Toole^{1

5}

Affiliations

¹ Behavioral and Brain Sciences, The University of Texas at Dallas, TX, USA.
² ycolon@wisc.edu.
³ University of Maryland Institute for Advanced Computer Studies, MD, USA.
⁴ carlosdc@jhu.edu.
⁵ otoole@utdallas.edu.

PMID: 33821927
PMCID: PMC8039571
DOI: 10.1167/jov.21.4.4

Abstract

Facial expressions distort visual cues for identification in two-dimensional images. Face processing systems in the brain must decouple image-based information from multiple sources to operate in the social world. Deep convolutional neural networks (DCNN) trained for face identification retain identity-irrelevant, image-based information (e.g., viewpoint). We asked whether a DCNN trained for identity also retains expression information that generalizes over viewpoint change. DCNN representations were generated for a controlled dataset containing images of 70 actors posing 7 facial expressions (happy, sad, angry, surprised, fearful, disgusted, neutral), from 5 viewpoints (frontal, 90° and 45° left and right profiles). Two-dimensional visualizations of the DCNN representations revealed hierarchical groupings by identity, followed by viewpoint, and then by facial expression. Linear discriminant analysis of full-dimensional representations predicted expressions accurately, mean 76.8% correct for happiness, followed by surprise, disgust, anger, neutral, sad, and fearful at 42.0%; chance \(\approx\)14.3%. Expression classification was stable across viewpoints. Representational similarity heatmaps indicated that image similarities within identities varied more by viewpoint than by expression. We conclude that an identity-trained, deep network retains shape-deformable information about expression and viewpoint, along with identity, in a unified form-consistent with a recent hypothesis for ventral visual stream processing.

PubMed Disclaimer

Figures

**Figure 1.**
An example of image variation for one identity in the KDEF dataset. Image IDs, from left to right: F02ANFL, F02DIHL, F02HAS, F02SUHR, F02SAFR.

**Figure 2.**
A visualization of the two-dimensional t-Distributed Stochastic Neighbor Embedding (t-SNE) projections of image representations for the KDEF dataset (color-coded by identity) shows that identities are well-separated by the network. Note: because there were more identities than colors, some colors were used for two identities.

**Figure 3.**
Two example identities in the t-SNE projection. Each panel (A and B) shows one identity. A hand-drawn blue line shows that the near-frontal images can be separated from profile images of the identity in the face space. Circles illustrate an example of expression clustering within viewpoint groups.

**Figure 4.**
Representational similarity maps comparing representations of 70 images of 4 randomly selected, individual identities. Heatmaps were organized first by expression, then by viewpoint within each expression. The pattern of similarity indicates that, for all identities, and all expressions, representations of images in near-frontal viewpoint groups are represented more similarly than full-profile viewpoint images.

**Figure 5.**
Expression classification results for the KDEF dataset using deep features shown by viewpoint and expression. All expressions are classified above chance. Note that chance performance, indicated by the dashed line on the figure, is approximately 14.3%.

See this image and copyright information in PMC

References

1. Bansal, A., Castillo, C., Ranjan, R., & Chellappa, R. (2017). The do's and don'ts for cnn-based face verification. In: Proceedings of the IEEE International Conference on Computer Vision, 2545–2554.
1. Bansal, A., Nanduri, A., Castillo, C. D., Ranjan, R., & Chellappa, R. (2017). Umdfaces: An annotated face dataset for training deep networks. In: 2017 IEEE International Joint Conference on Biometrics (IJCB), IEEE, 464–473.
1. Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77(3), 305–327. - PubMed
1. Bruyer, R., Laterre, C., Seron, X., Feyereisen, P., Strypstein, E., Pierrard, E., & Rectem, D. (1983). A case of prosopagnosia with some preserved covert remembrance of familiar faces. Brain and Cognition, 2(3), 257–284. - PubMed
1. Calder, A. J., Burton, A. M., Miller, P., Young, A. W., & Akamatsu, S. (2001). A principal component analysis of facial expressions. Vision Research, 41(9), 1179–1208. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 EY029692/EY/NEI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Facial expression is retained in deep networks trained for face identification

Affiliations

Facial expression is retained in deep networks trained for face identification

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources