Which deep learning model can best explain object representations of within-category exemplars?

Dongha Lee^{1

2}

Affiliations

PMID: 34520508
PMCID: PMC8444465
DOI: 10.1167/jov.21.10.12

Which deep learning model can best explain object representations of within-category exemplars?

Dongha Lee. J Vis. 2021.

. 2021 Sep 1;21(10):12.

doi: 10.1167/jov.21.10.12.

Author

Dongha Lee^{1

2}

Affiliations

¹ Cognitive Science Research Group, Korea Brain Research Institute, Daegu, Republic of Korea.
² donghalee@kbri.re.kr.

PMID: 34520508
PMCID: PMC8444465
DOI: 10.1167/jov.21.10.12

Abstract

Deep neural network (DNN) models realize human-equivalent performance in tasks such as object recognition. Recent developments in the field have enabled testing the hierarchical similarity of object representation between the human brain and DNNs. However, the representational geometry of object exemplars within a single category using DNNs is unclear. In this study, we investigate which DNN model has the greatest ability to explain invariant within-category object representations by computing the similarity between representational geometries of visual features extracted at the high-level layers of different DNN models. We also test for the invariability of within-category object representations of these models by identifying object exemplars. Our results show that transfer learning models based on ResNet50 best explained both within-category object representation and object identification. These results suggest that the invariability of object representations in deep learning depends not on deepening the neural network but on building a better transfer learning model.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of experimental design. (A) Higher-level visual features for 80 tools were extracted from the last fully connected layers through transfer learning. Multidimensional scaling was used to visualize DNN representations of within-category object exemplars. (B) The identification accuracy was computed by labeling an object in the target images as one of the 80 objects in the testing images. (C) Representational similarity between DNN representations of object exemplars was calculated using the correlation distance between visual features.

**Figure 2.**
Object identification accuracy for the nine DNN models. The identification accuracies of ResNet50, ResNet101, and VGG19 were significantly higher than those of other DNN models. The lines on the bars indicate the standard error of the mean. The horizontal lines indicate that the aforementioned models performed significantly different from the other models.

**Figure 3.**
Correlation analysis between object identification accuracy and properties of DNN models. The identification accuracy showed a strong positive correlation with the validation accuracy, whereas no significant correlation was observed between the identification accuracy and number of DNN layers.

**Figure 4.**
Object representation similarity for nine DNN models. Representation similarity of ResNet50 was significantly higher than that of other DNN models. The lines of the bars indicate the standard error of the mean. The horizontal lines indicate that the performance of ResNet50 was significantly different from the other DNN models.

**Figure 5.**
(A) Schematic of transfer learning using the ResNet50 architecture. (B) Object representation similarity using the visual features with low identification accuracy. (C) Object representation similarity using visual features with high identification accuracy.

**Figure 6.**
Comparisons of DNN representations in tool-preferring regions. The horizontal lines indicate significant differences in DNN–brain representation similarity between the DNN models.

See this image and copyright information in PMC

References

1. Almeida, J., Fintzi, A. R., & Mahon, B. Z. (2013). Tool manipulation knowledge is retrieved by way of the ventral visual object processing pathway. Cortex , 49(9), 2334–2344. - PMC - PubMed
1. Ambrose, S. H. (2001). Paleolithic technology and human evolution. Science , 291(5509), 1748–1753. - PubMed
1. Andrews, T. J., & Ewbank, M. P. (2004). Distinct representations for facial identity and changeable aspects of faces in the human temporal lobe. Neuroimage , 23(3), 905–913. - PubMed
1. Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology , 14(12), e1006613. - PMC - PubMed
1. Baylis, G. C., & Driver, J. (2001). Shape-coding in IT cells generalizes over contrast and mirror reversal, but not figure-ground reversal. Nature Neuroscience , 4(9), 937–942. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Which deep learning model can best explain object representations of within-category exemplars?

Affiliations

Which deep learning model can best explain object representations of within-category exemplars?

Author

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources