Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 28;8(1):3752.
doi: 10.1038/s41598-018-22160-9.

Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization

Affiliations

Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization

Haiguang Wen et al. Sci Rep. .

Abstract

The brain represents visual objects with topographic cortical patterns. To address how distributed visual representations enable object categorization, we established predictive encoding models based on a deep residual network, and trained them to predict cortical responses to natural movies. Using this predictive model, we mapped human cortical representations to 64,000 visual objects from 80 categories with high throughput and accuracy. Such representations covered both the ventral and dorsal pathways, reflected multiple levels of object features, and preserved semantic relationships between categories. In the entire visual cortex, object representations were organized into three clusters of categories: biological objects, non-biological objects, and background scenes. In a finer scale specific to each cluster, object representations revealed sub-clusters for further categorization. Such hierarchical clustering of category representations was mostly contributed by cortical representations of object features from middle to high levels. In summary, this study demonstrates a useful computational strategy to characterize the cortical organization and representations of visual features for rapid categorization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
DNN-based Voxel-wise encoding models. (a) Performance of ResNet-based encoding models in predicting the cortical responses to novel testing movies for three subjects. The accuracy is measured by the average Pearson’s correlation coefficient (r) between the predicted and observed fMRI responses across five testing movies (q < 0.01 after correction for multiple comparison using the false discovery rate (FDR) method, and with threshold r > 0.2). The prediction accuracy is displayed on both flat (top) and inflated (bottom left) cortical surfaces for Subject 1. (b) Explained variance of the cortical response to testing movie by the layer-specific visual features in ResNet. The right shows the index to the ResNet layer that best explains the cortical response at every voxel. (c) Comparison between the ResNet-based and the AlexNet-based encoding models. Each bar represents the mean ± SE of the prediction accuracy (normalized by the noise ceiling, i.e. dividing prediction accuracy (r) by the noise ceiling at every voxel) within a ROI across voxels and subjects, and * indicates significance (p < 0.001) with paired t-test.
Figure 2
Figure 2
Human-face representations with encoding models and functional localizer. (a) Model-simulated representation of human face from ResNet-based encoding models. The representation is displayed on both inflated (top) and flat (bottom) cortical surfaces. (b) Face vs. non-face contrast map obtained with a face localizer experiment shows regions selective for human faces, including occipital face area (OFA), fusiform face area (FFA), and posterior superior temporal sulcus (pSTS).
Figure 3
Figure 3
Cortical representations of 80 object categories. Each panel shows the representation map of an object category on flat cortical surface from Subject 1. The category label is on top left. The color bar shows the cortical response. Each map covers the same extent on the cortex as shown in Fig. 2a, bottom.
Figure 4
Figure 4
Category-selectivity at individual cortical locations. (a) The category-selectivity across the cortical surface. (b) The category-selectivity profile of example cortical locations. For each location, top 10 categories with the highest responses are showed in a descending order. (c) Category-selectivity within ROIs (mean ± SE) in the early visual areas (red), ventral stream areas (green), and dorsal stream areas (blue).
Figure 5
Figure 5
Categorical similarity and clustering in cortical representation at the scale of the entire visual cortex. (a) The left is the (inter-category) similarity matrix (Pearson’s correlation r) of cortical representation. Each element represents the cortical similarity between a pair of categories averaged across subjects (see individual results in Supplementary Fig. S2). It is well separated into three clusters with modularity Q = 0.35. The middle is the (inter-category) similarity matrix of semantic meaning (measured by LCH). The right is the Pearson’s correlation between the inter-category cortical similarity and the inter-category semantic similarity (with three different measures, i.e. the LCH similarity, the word2vec similarity, and the GloVe similarity). (b) The three clusters of cortical representation are related to three superordinate-level categories: non-biological objects, biological objects, and background scenes. The average cortical representations across categories within each cluster are shown on both inflated and flattened cortical surfaces.
Figure 6
Figure 6
Contributions of different levels of visual features to the similarity and modularity in cortical representation. (a) The left shows the inter-category similarity of cortical representations contributed by layer-wise category information ranging from the lowest (layer 1) to highest (layer 50) layer. The order of categories is the same as in Fig. 6a. The right plot shows the corresponding modularity index due to visual features in each layer of ResNet. The visual features at the middle layers give rise to the highest modularity. (b) 18 example visual features at the 31st layer are visualized in pixel space. Each visual feature shows 4 exemplars that maximize the feature representation. (c) The correlation between the inter-category cortical similarity across layers and the inter-category semantic similarity (with three different measures, i.e. the LCH similarity, the word2vec similarity, and the GloVe similarity) is shown for each layer in ResNet.
Figure 7
Figure 7
Categorical similarity and clustering in cortical representation within superordinate-level categories. (a) Fine-scale cortical areas specific to each superordinate-level category: biological objects (red), background scenes (green) and non-biological objects (blue). (b) The cortical similarity between categories in fine-scale cortical representation. The categories in each sub-cluster were displayed on the right. See individual results in Supplementary Fig. S2.
Figure 8
Figure 8
Contribution of layer-wise visual features to the similarity and modularity in cortical representations within superordinate-level categories. The left shows the similarity between categories in fine-scale cortical representations that are contributed by separated category information from individual layers. The order of categories is the same as in Fig. 7. The right plot shows the modularity index across all layers. The highest-layer visual features show the highest modularity for biological objects.

Similar articles

Cited by

References

    1. DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends in cognitive sciences. 2007;11:333–341. doi: 10.1016/j.tics.2007.06.010. - DOI - PubMed
    1. Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. nature. 1996;381:520. doi: 10.1038/381520a0. - DOI - PubMed
    1. Van Essen DC, Anderson CH, Felleman DJ. Information processing in the primate visual system: an integrated systems perspective. Science. 1992;255:419. doi: 10.1126/science.1734518. - DOI - PubMed
    1. Yamins DL, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature neuroscience. 2016;19:356–365. doi: 10.1038/nn.4244. - DOI - PubMed
    1. Grill-Spector K, Weiner KS. The functional architecture of the ventral temporal cortex and its role in categorization. Nature Reviews Neuroscience. 2014;15:536–548. doi: 10.1038/nrn3747. - DOI - PMC - PubMed

Publication types