Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition

Connor J Parde^{1

2}, Y Ivette Colón^{1

3}, Matthew Q Hill^{1

4}, Carlos D Castillo^{5

6}, Prithviraj Dhar^{5

7}, Alice J O'Toole^{1

8}

Affiliations

¹ School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA.
² connor.parde@utdallas.edu.
³ ycolon@wisc.edu.
⁴ matthew.hill@utdallas.edu.
⁵ University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA.
⁶ carlos.d.castillo@gmail.com.
⁷ pdhar@cs.umd.edu.
⁸ otoole@utdallas.edu.

PMID: 34379084
PMCID: PMC8363775
DOI: 10.1167/jov.21.8.15

Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition

Connor J Parde et al. J Vis. 2021.

. 2021 Aug 2;21(8):15.

doi: 10.1167/jov.21.8.15.

Authors

Connor J Parde^{1

2}, Y Ivette Colón^{1

3}, Matthew Q Hill^{1

4}, Carlos D Castillo^{5

6}, Prithviraj Dhar^{5

7}, Alice J O'Toole^{1

8}

Affiliations

¹ School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, TX, USA.
² connor.parde@utdallas.edu.
³ ycolon@wisc.edu.
⁴ matthew.hill@utdallas.edu.
⁵ University of Maryland Institute of Advanced Computer Studies, University of Maryland, College Park, MD, USA.
⁶ carlos.d.castillo@gmail.com.
⁷ pdhar@cs.umd.edu.
⁸ otoole@utdallas.edu.

PMID: 34379084
PMCID: PMC8363775
DOI: 10.1167/jov.21.8.15

Abstract

Single-unit responses and population codes differ in the "read-out" information they provide about high-level visual representations. Diverging local and global read-outs can be difficult to reconcile with in vivo methods. To bridge this gap, we studied the relationship between single-unit and ensemble codes for identity, gender, and viewpoint, using a deep convolutional neural network (DCNN) trained for face recognition. Analogous to the primate visual system, DCNNs develop representations that generalize over image variation, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. At the unit level, we measured the number of single units needed to predict attributes (identity, gender, viewpoint) and the predictive value of individual units for each attribute. Identification was remarkably accurate using random samples of only 3% of the network's output units, and all units had substantial identity-predicting power. Cross-unit responses were minimally correlated, indicating that single units code non-redundant identity cues. Gender and viewpoint classification required large-scale pooling of units-individual units had weak predictive power. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint separated into high-dimensional subspaces, ordered by explained variance. Unit-based directions in the representational space were compared with the directions associated with the attributes. Identity, gender, and viewpoint contributed to all individual unit responses, undercutting a neural tuning analogy. Instead, single-unit responses carry superimposed, distributed codes for face identity, gender, and viewpoint. This undermines confidence in the interpretation of neural representations from unit response profiles for both DCNNs and, by analogy, high-level vision.

PubMed Disclaimer

Figures

**Figure 1.**
(A) Identification accuracy is plotted as a function of subspace dimensionality, measured as area under the ROC curve (AUC). Performance is nearly perfect (AUC $\approx$ 1.0) with the full 512-dimensional descriptor and shows negligible declines until subspace dimensionality reaches 16 units. Performance with as few as two units remains above chance. (B) Correlation histogram for unit responses across images indicates that units capture non redundant information for identification.

**Figure 2.**
Effect sizes for units (A) and principal components (B) for identity, gender, and viewpoint. For both units and principal components, top panels illustrate the dominance of identity over gender and viewpoint. Lower panels show an approximately uniform distribution of effect sizes for units (A) and differentiated effect sizes for principal components (B) in all three attributes.

**Figure 3.**
Gender and viewpoint prediction with variable numbers of randomly sampled units. Gender classification declines gradually (A) and viewpoint prediction declines rapidly (B) as sample size decreases. Mean performance across samples ( $n = 50$ ) is shown with a diamond, colored by sample size. Because these performance measures are qualitatively different, they should not be compared in absolute terms (for comparison between gender, viewpoint, and identity, see effect sizes; Figure 2).

**Figure 4.**
(A) Sliding windows of PCs used to predict identity (purple), gender (teal), and yaw (yellow) across the PC subspaces. Identification accuracy is highest when using early PCs. Gender and viewpoint classification are best when using subspaces with the highest effect sizes for gender and viewpoint separation, respectively. (B) Similarity between PCs and directions diagnostic for identity (purple), gender (teal), and yaw (yellow). Identity direction is the average similarity between identity templates and PCs. Gender direction is the linear discriminant line from the LDA for gender classification. Viewpoint direction is the weight vector from the linear regression for viewpoint prediction.

**Figure 5.**
(Top) For a single example unit, absolute value of similarities between unit direction and each PC shows confounding of unit response with identity, gender, and viewpoint. (Bottom) Density plot of similarities between the example unit and PCs associated with identity (purple), gender (blue), and viewpoint (yellow). The distributions overlap almost completely, indicating that each type of information contributes to the unit's activation. This finding was consistent across all unit basis vectors.

See this image and copyright information in PMC

References

1. Abudarham, N., & Yovel, G. (2020). Face recognition depends on specialized mechanisms tuned to view-invariant facial features: Insights from deep neural networks optimized for face or object recognition. bioRxiv. - PubMed
1. Bansal, A., Castillo, C., Ranjan, R., & Chellappa, R. (2017). The do's and don'ts for cnn-based face verification. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 2545–2554).
1. Bansal, A., Nanduri, A., Castillo, C. D., Ranjan, R., & Chellappa, R. (2017). Umdfaces: An annotated face dataset for training deep networks. In IEEE International Joint Conference on Biometrics (IJCB) (pp. 464–473). IEEE.
1. Bashivan, P., Kar, K., & DiCarlo, J. J. (2019). Neural population control via deep image synthesis. Science, 364(6439), 1–13. - PubMed
1. Casper, S., Boix, X., D'Amario, V., Guo, L., Schrimpf, M., Vinken, K., & Krieman, G. (2019). Frivolous units: Wider networks are not really that wide. arXiv preprint arXiv:1912.04783.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 EY029692/EY/NEI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition

Affiliations

Closing the gap between single-unit and neural population codes: Insights from deep learning in face recognition

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous