Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting

Katherine R Storrs^{1

2}, Tim C Kietzmann^{3

4}, Alexander Walther⁴, Johannes Mehrer⁴, Nikolaus Kriegeskorte⁵

Affiliations

¹ Justus Liebig University Giessen, Germany.
² Centre for Mind, Brain and Behaviour (CMBB), Research Campus Central Hessen.
³ Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
⁴ MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom.
⁵ Columbia University.

PMID: 34272948
DOI: 10.1162/jocn_a_01755

Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting

Katherine R Storrs et al. J Cogn Neurosci. 2021.

. 2021 Sep 1;33(10):2044-2064.

doi: 10.1162/jocn_a_01755.

Authors

Katherine R Storrs^{1

2}, Tim C Kietzmann^{3

4}, Alexander Walther⁴, Johannes Mehrer⁴, Nikolaus Kriegeskorte⁵

Affiliations

¹ Justus Liebig University Giessen, Germany.
² Centre for Mind, Brain and Behaviour (CMBB), Research Campus Central Hessen.
³ Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
⁴ MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom.
⁵ Columbia University.

PMID: 34272948
DOI: 10.1162/jocn_a_01755

Abstract

Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual cortex. What remains unclear is how strongly experimental choices, such as network architecture, training, and fitting to brain data, contribute to the observed similarities. Here, we compare a diverse set of nine DNN architectures on their ability to explain the representational geometry of 62 object images in human inferior temporal cortex (hIT), as measured with fMRI. We compare untrained networks to their task-trained counterparts and assess the effect of cross-validated fitting to hIT, by taking a weighted combination of the principal components of features within each layer and, subsequently, a weighted combination of layers. For each combination of training and fitting, we test all models for their correlation with the hIT representational dissimilarity matrix, using independent images and subjects. Trained models outperform untrained models (accounting for 57% more of the explainable variance), suggesting that structured visual features are important for explaining hIT. Model fitting further improves the alignment of DNN and hIT representations (by 124%), suggesting that the relative prevalence of different features in hIT does not readily emerge from the Imagenet object-recognition task used to train the networks. The same models can also explain the disparate representations in primary visual cortex (V1), where stronger weights are given to earlier layers. In each region, all architectures achieved equivalently high performance once trained and fitted. The models' shared properties-deep feedforward hierarchies of spatially restricted nonlinear filters-seem more important than their differences, when modeling human visual representations.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting

Affiliations

Diverse Deep Neural Networks All Predict Human Inferior Temporal Cortex Well, After Training and Fitting

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources