Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 1;21(10):12.
doi: 10.1167/jov.21.10.12.

Which deep learning model can best explain object representations of within-category exemplars?

Affiliations

Which deep learning model can best explain object representations of within-category exemplars?

Dongha Lee. J Vis. .

Abstract

Deep neural network (DNN) models realize human-equivalent performance in tasks such as object recognition. Recent developments in the field have enabled testing the hierarchical similarity of object representation between the human brain and DNNs. However, the representational geometry of object exemplars within a single category using DNNs is unclear. In this study, we investigate which DNN model has the greatest ability to explain invariant within-category object representations by computing the similarity between representational geometries of visual features extracted at the high-level layers of different DNN models. We also test for the invariability of within-category object representations of these models by identifying object exemplars. Our results show that transfer learning models based on ResNet50 best explained both within-category object representation and object identification. These results suggest that the invariability of object representations in deep learning depends not on deepening the neural network but on building a better transfer learning model.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of experimental design. (A) Higher-level visual features for 80 tools were extracted from the last fully connected layers through transfer learning. Multidimensional scaling was used to visualize DNN representations of within-category object exemplars. (B) The identification accuracy was computed by labeling an object in the target images as one of the 80 objects in the testing images. (C) Representational similarity between DNN representations of object exemplars was calculated using the correlation distance between visual features.
Figure 2.
Figure 2.
Object identification accuracy for the nine DNN models. The identification accuracies of ResNet50, ResNet101, and VGG19 were significantly higher than those of other DNN models. The lines on the bars indicate the standard error of the mean. The horizontal lines indicate that the aforementioned models performed significantly different from the other models.
Figure 3.
Figure 3.
Correlation analysis between object identification accuracy and properties of DNN models. The identification accuracy showed a strong positive correlation with the validation accuracy, whereas no significant correlation was observed between the identification accuracy and number of DNN layers.
Figure 4.
Figure 4.
Object representation similarity for nine DNN models. Representation similarity of ResNet50 was significantly higher than that of other DNN models. The lines of the bars indicate the standard error of the mean. The horizontal lines indicate that the performance of ResNet50 was significantly different from the other DNN models.
Figure 5.
Figure 5.
(A) Schematic of transfer learning using the ResNet50 architecture. (B) Object representation similarity using the visual features with low identification accuracy. (C) Object representation similarity using visual features with high identification accuracy.
Figure 6.
Figure 6.
Comparisons of DNN representations in tool-preferring regions. The horizontal lines indicate significant differences in DNN–brain representation similarity between the DNN models.

References

    1. Almeida, J., Fintzi, A. R., & Mahon, B. Z. (2013). Tool manipulation knowledge is retrieved by way of the ventral visual object processing pathway. Cortex , 49(9), 2334–2344. - PMC - PubMed
    1. Ambrose, S. H. (2001). Paleolithic technology and human evolution. Science , 291(5509), 1748–1753. - PubMed
    1. Andrews, T. J., & Ewbank, M. P. (2004). Distinct representations for facial identity and changeable aspects of faces in the human temporal lobe. Neuroimage , 23(3), 905–913. - PubMed
    1. Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology , 14(12), e1006613. - PMC - PubMed
    1. Baylis, G. C., & Driver, J. (2001). Shape-coding in IT cells generalizes over contrast and mirror reversal, but not figure-ground reversal. Nature Neuroscience , 4(9), 937–942. - PubMed

Publication types