Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 16:2024.10.15.617743.
doi: 10.1101/2024.10.15.617743.

Perceptual Expertise and Attention: An Exploration using Deep Neural Networks

Affiliations

Perceptual Expertise and Attention: An Exploration using Deep Neural Networks

Soukhin Das et al. bioRxiv. .

Abstract

Perceptual expertise and attention are two important factors that enable superior object recognition and task performance. While expertise enhances knowledge and provides a holistic understanding of the environment, attention allows us to selectively focus on task-related information and suppress distraction. It has been suggested that attention operates differently in experts and in novices, but much remains unknown. This study investigates the relationship between perceptual expertise and attention using convolutional neural networks (CNNs), which are shown to be good models of primate visual pathways. Two CNN models were trained to become experts in either face or scene recognition, and the effect of attention on performance was evaluated in tasks involving complex stimuli, such as superimposed images containing superimposed faces and scenes. The goal was to explore how feature-based attention (FBA) influences recognition within and outside the domain of expertise of the models. We found that each model performed better in its area of expertise-and that FBA further enhanced task performance, but only within the domain of expertise, increasing performance by up to 35% in scene recognition, and 15% in face recognition. However, attention had reduced or negative effects when applied outside the models' expertise domain. Neural unit-level analysis revealed that expertise led to stronger tuning towards category-specific features and sharper tuning curves, as reflected in greater representational dissimilarity between targets and distractors, which, in line with the biased competition model of attention, leads to enhanced performance by reducing competition. These findings highlight the critical role of neural tuning at single as well as network level neural in distinguishing the effects of attention in experts and in novices and demonstrate that CNNs can be used fruitfully as computational models for addressing neuroscience questions not practical with the empirical methods.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Model and study design. (A) The VGG16 convolutional neural network model (feature units and dimensions are labelled across the respective layers). One model is pretrained on ImageNet database (Scene-expert) and the other on VGGFace database (Face-expert). (B) The final layer of each network was replaced with a series of binary classifiers (logistic regression, one for each category) which were trained based on the datasets used in this study. (C) Regular and superimposed images from each category, sized at 224×224 pixels (input dimensions of VGG16). Superimposed images (bottom) were composed by transparently superimposing two images from either the same or the different categories. The regular images are used for training the binary classifiers which were then tested on both regular and superimposed images to identify the presence or absence of a certain category. (D) 5-fold cross-validation performance of the two models, image-category wise, for the Face-expert (right) and Scene-expert (left) models. Images were used from publicly available datasets (–41).
Figure 2.
Figure 2.
Example tuning curves of units (2 randomly chosen units are shown here) in each layer. From these tuning curves the preference of a neuron towards face or scene is determined.
Figure 3
Figure 3
Schematic of the FBA implementation in the model. The slope of the Rectified Linear Unit (ReLu) activation function is modulated based on the tuning values of the neuron. If a certain unit in a layer prefers the attended object category, the slope of the ReLu function is tuned-up (green arrow) whereas if a unit does not prefer the attended category, its slope is tuned-down (red arrow). See the Methods section for more information about how FBA was applied.
Figure 4.
Figure 4.
Outcomes of applying FBA to VGG16 pretrained on ImageNet (A) and VGGFace (B), across categories. Differential specificity of categories can be observed in terms of performance increases. (A) For Scene-expert model, FBA increased the performance of detecting the presence versus absence of scenes more than detecting the presence versus absence of faces. (B) For the Face-expert model, FBA was effective for enhancing the performance of detecting the presence versus absence of faces; for detecting the presence versus absence of scenes, the FBA’s effect was not very helpful, and could even be negative (i.e., decreasing the performance of the model).
Figure 5.
Figure 5.
Tuning quality across layers in the Scene-expert (A) and Face-expert (B) network divided into face (red) and scene (yellow) selective neurons during baseline when attention was not applied. Bars indicate the tuning quality distribution of neurons across layers 1 through 13. Tuning quality of neurons that prefer scenes is higher than that that prefer faces in the Scene-expert Model (A) and vice-versa in the Face-expert Model (B). (C-F) Tuning quality distribution divided based on FBA applied to scene and face selective neural units when attention is applied at different layers (row-wise layers 3,5,9 & 11 shown) in the Scene-expert Model (C-D), and the Face-expert Model (E-F).
Figure 6.
Figure 6.
Representational Similarity Analysis (RSA) across different categories of images and models. (A) Representational dissimilarity matrices (RDMs). For each layer within a model, separate RDMs were constructed for all scene and face images by one minus the Pearson r correlation between each pair of image-evoked multivariate neural activations. Here, RDMs are shown for layer 2, 5, 9 and 12 for each model. (B) Theoretical RDM representing the ideal degree of separation between Scene (Manmade and Natural) and Face (Male and Female) images. (C) RSA analysis, performed by calculating the rank-ordered Spearman correlation between the off-diagonal triangular values of the theoretical RDM and layer RDMs. (D) RSA analysis for each model, layer-wise during the baseline condition when attention was not applied to any neural units in Scene-expert (light blue) and Face-expert (deep blue) models. (E, F) Representational Similarity (in Spearman rho correlation) when attention was applied to Face-selective units (E) and Scene-selective units (F) at layers 2,4,7 & 9 (left to right, highlighted with red layer labels on the x-axis). Error bars indicate ± 1 SEM obtained from bootstrapping technique using 100 samples. * = p < 0.05, one-tailed paired t tests and FDR corrected for multiple comparison across layers.

References

    1. Lindsay GW, Miller KD. How biological attention mechanisms improve task performance in a large-scale visual system model. eLife 2018. p. 1–29. - PMC - PubMed
    1. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In: Francis B, David B, editors. Proceedings of the 32nd International Conference on Machine Learning; Proceedings of Machine Learning Research: PMLR; 2015. p. 2048–57.
    1. Cao C, Liu X, Yang Y, Yu Y, Wang J, Wang Z, et al. Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks. 2015 IEEE International Conference on Computer Vision (ICCV)2015. p. 2956–64.
    1. Yang X, Yan J, Wang W, Li S, Hu B, Lin J. Brain-inspired models for visual object recognition: an overview. Artificial Intelligence Review. 2022;55(7):5263–311.
    1. Kanwisher N, Gupta P, Dobs K. CNNs reveal the computational implausibility of the expertise hypothesis. iScience. 2023;26(2). - PMC - PubMed

Publication types

LinkOut - more resources