. 2018 Mar 7:7:e32962.

doi: 10.7554/eLife.32962.

Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior

Iris Ia Groen^{1

2}, Michelle R Greene³, Christopher Baldassano⁴, Li Fei-Fei⁵, Diane M Beck^{6

7}, Chris I Baker¹

Affiliations

¹ Laboratory of Brain and Cognition, National Institutes of Health, Bethesda, United States.
² Department of Psychology, New York University, New York City, United States.
³ Neuroscience Program, Bates College, Maine, United States.
⁴ Princeton Neuroscience Institute, Princeton University, Princeton, United States.
⁵ Stanford Vision Lab, Stanford University, Stanford, United States.
⁶ Department of Psychology, University of Illinois, Urbana-Champaign, United States.
⁷ Beckman Institute, University of Illinois, Urbana-Champaign, United States.

PMID: 29513219
PMCID: PMC5860866
DOI: 10.7554/eLife.32962

Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior

Iris Ia Groen et al. Elife. 2018.

. 2018 Mar 7:7:e32962.

doi: 10.7554/eLife.32962.

Authors

Iris Ia Groen^{1

2}, Michelle R Greene³, Christopher Baldassano⁴, Li Fei-Fei⁵, Diane M Beck^{6

7}, Chris I Baker¹

Affiliations

¹ Laboratory of Brain and Cognition, National Institutes of Health, Bethesda, United States.
² Department of Psychology, New York University, New York City, United States.
³ Neuroscience Program, Bates College, Maine, United States.
⁴ Princeton Neuroscience Institute, Princeton University, Princeton, United States.
⁵ Stanford Vision Lab, Stanford University, Stanford, United States.
⁶ Department of Psychology, University of Illinois, Urbana-Champaign, United States.
⁷ Beckman Institute, University of Illinois, Urbana-Champaign, United States.

PMID: 29513219
PMCID: PMC5860866
DOI: 10.7554/eLife.32962

Abstract

Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.

Trial registration: ClinicalTrials.gov NCT00001360.

Keywords: behavioral categorization; computational model; deep neural network; fMRI; human; neuroscience; scene perception; variance partitioning.

PubMed Disclaimer

Conflict of interest statement

IG, MG, CB, LF, DB, CB No competing interests declared

Figures

**Figure 1.. Models and predicted stimulus dissimilarity.**
(A) Stimuli were characterized in three different ways: functions (derived using human-generated action labels), objects (derived using human-generated object labels) and DNN features (derived using layer 7 of a 1000-class trained convolutional neural network). (B) RDMs showing predicted representational dissimilarity in terms of functions, objects and DNN features for the 30 scene categories sampled from Greene et al. (2016). Scenes were sampled to achieve minimal between-RDM correlations. The category order in the RDMs is determined based on a k-means clustering on the functional RDM; clustering was performed by requesting eight clusters, which explained 80% of the variance in that RDM. RDMs were rank-ordered for visualization purposes only. (C) Multi-dimensional scaling plots of the model RDMs, color-coded based on the functional clusters depicted in B). Functional model clusters reflected functions such as ‘sports’, and ‘transportation’; note however that these semantic labels were derived post-hoc after clustering, and did not affect stimulus selection. Critically, representational dissimilarity based on the two other models (objects and DNN features) predicted different cluster patterns. All stimuli and model RDMs, along with the behavioral and fMRI measurements, are provided in Figure 1—source data 1.

**Figure 2.. Behavioral multi-arrangement paradigm and results.**
(A) Participants organized the scenes inside a large white circle according to their perceived similarity as determined by their own judgment, without receiving instructions as to what information to use to determine scene similarity. (B) RDM displaying the average dissimilarity between categories in the behavioral arrangements, ordered the same way as Figure 1B (rank-ordered for visualization only). (C) Average (bar) and individual participant (gray dots) correlations between the behavioral RDM and the model RDMs for objects (red), DNN features (yellow) and functions (blue). Stars (*) indicate p<0.05 for model-specific one-sided signed-rank tests against zero, while horizontal bars indicate p<0.05 for two-sided pairwise signed-rank tests between models; p-values were FDR-corrected across both types of comparisons. The light-blue shaded rectangular region reflects the upper and lower bound of the noise ceiling, indicating RDM similarity between individual participants and the group average (see Materials and methods). Error bars reflect SEM across participants. (D) Count of participants whose behavioral RDM correlated highest with either objects, DNN features or functions. (E) Partial correlations for each model RDM. Statistical significance was determined the same way as in C). (F) Euler diagram depicting the results of a variance partitioning analysis on the behavioral RDM for objects (red circle), DNN features (yellow circle) and functions (blue circle). Unique (non-overlapping diagram portions) and shared (overlapping diagram portions) variances are expressed as percentages of the total variance explained by all models combined.

**Figure 3.. RDMs and model comparisons for fMRI Experiment 1 (n = 20).**
(A) RDMs displaying average dissimilarity between categories in multi-voxel patterns in PPA, OPA and MPA, ordered as in Figure 1B (rank-ordered for visualization only). (B) Average (bar) and individual participant (gray dots) correlations between the ROIs in A) and the model RDMs for objects (red), DNN features (yellow) and functions (blue) (FDR-corrected). See legend of Figure 2B for explanation of the statistical indicators and noise ceiling. (C) Partial correlations for each model RDM. Statistics are the same as in B). (D) Euler diagram depicting results of variance partitioning the average dissimilarity in each ROI between models, expressed as percentages of unique and shared variance of the variance explained by all three models together.

**Figure 4.. Correlations and variance partitioning of behavioral measurements of scene categorization and similarity of fMRI responses.**
(A) Correlations of three measures of behavioral categorization (see Results section for details) with fMRI response patterns in PPA, OPA and MPA. See legend of Figure 2B for explanation of the statistical indicators and noise ceiling. (B) Euler diagram depicting the results of variance partitioning the fMRI responses in PPA, OPA and MPA for DNN features (yellow), functions (blue) and average sorting behavior (green), indicating that the majority of the variance in the fMRI signal that is explained by categorization behavior is shared with the DNN features.

**Figure 5.. RDMs and model comparisons for Experiment 2 (n = 8, covert naming task).**
(A) Average dissimilarity between categories in multi-voxel patterns measured in PPA, OPA and MPA (rank-ordered as in Figure 1B). (B) Correlations between the ROIs in A) and the model RDMs for objects (red), DNN features (yellow) and functions (blue) (FDR-corrected). See legend of Figure 2B for explanation of the statistical indicators and noise ceiling. Note how in PPA, the DNN model correlation approaches the noise ceiling, suggesting that this model adequately captures the information reflected in this ROI. (C) Euler diagram depicting the results of variance partitioning the average dissimilarity in each ROI. (D) Average (bars) and individual (dots/lines) within-participant (n = 4) comparison of fMRI-model correlations across the different task manipulations in Experiment 1 and 2 (participants were presented with a different set of scenes in each task, see Materials and methods). Note how covert naming mainly enhances the correlation with DNN features.

**Figure 6.. Medial (left) and lateral (right) views of group-level searchlights for (A) the DNN and (B) function model, overlaid on surface reconstructions of both hemispheres of one participant.**
Each map was created by submitting the partial correlation maps for each model and hemisphere to one-sample tests against a mean of zero, cluster-corrected for multiple comparisons using Threshold-Free Cluster Enhancement (thresholded on z = 1.64, corresponding to one-sided p<0.05). Unthresholded versions of the average partial correlation maps are inset above. Group-level ROIs PPA, OPA and MPA are highlighted in solid white lines. Consistent with the ROI analyses, the DNN feature model contributed uniquely to representation in PPA and OPA. The function model uniquely correlated with a bilateral ventral region, as well as a left-lateralized region overlapping with the middle temporal and occipital gyri.

**Figure 7.. Multi-arrangement behavior searchlights and post-hoc analysis of functional clusters.**
(A) Searchlight result for behavioral scene categorization. Maps reflect correlation (Pearson’s r) of the group-average behavior in the multi-arrangement task from the participants of Experiment 1. Scene-selective ROIs are outlined in white solid lines; the searchlight clusters showing a significant contribution of the functional model are outlined in dashed white lines for reference. See Figure 6 for further explanation of the searchlight display. (B) RDM and MDS plots based on the MVPA patterns in the function model searchlight clusters. RDM rows are ordered as in Figure 1B and category color coding in the MDS plots is as in Figure 1C. (C) Illustrative exemplars of the four categories that were most dissimilar from other categories within the searchlight-derived clusters depicted in B.

**Figure 8.. DNN layer and DNN training comparisons in terms of correlation with fMRI responses in scene-selective cortex.**
Panels show convolutional and fully-connected (FC) layer-by-layer RDM correlations between (A) an object-trained (ReferenceNet) and a scene-trained (Places) DNN; (B) both DNNs and the *a priori* selected feature models; (C) the object-trained DNN and scene-selective ROIs; (D) the scene-trained DNN and scene-selective ROIs (all comparisons FDR-corrected within ROI; See legend of Figure 2B for explanation of the statistical indicators and noise ceiling). While the decreasing correlation between DNNs indicates stronger task-specificity of higher DNN layers, the original fc7 DNN feature model correlated most strongly with high-level layers of both DNNs. The object-trained and the scene-trained DNN correlated similarly with PPA and OPA, with both showing remarkable good performance for mid-level layers. The RDMs for each individual DNN layer are provided in Figure 1—source data 1. Searchlight maps for each layer of the object- and scene-trained DNN are provided in Figure 8—video 1 and Figure 8—video 2, respectively.

See this image and copyright information in PMC

References

1. Aguirre GK, Zarahn E, D'Esposito M. An area within human ventral cortex sensitive to "building" stimuli: evidence and implications. Neuron. 1998;21:373–383. - PubMed
1. Baldassano C, Esteva A, Fei-Fei L, Beck DM. Two distinct scene-processing networks connecting vision and memory. eNeuro. 2016;3:1–14. doi: 10.1523/ENEURO.0178-16.2016. - DOI - PMC - PubMed
1. Bar M, Aminoff E. Cortical analysis of visual context. Neuron. 2003;38:347–358. doi: 10.1016/S0896-6273(03)00167-3. - DOI - PubMed
1. Bau D, Zhou B, Khosla A, Oliva A, Torralba A. Network dissection: quantifying interpretability of deep visual representations. arXiv. 2017 https://arxiv.org/abs/1704.05796
1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B. 1995;57:289–300.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior

Affiliations

Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical