Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 2;16(10):e1008215.
doi: 10.1371/journal.pcbi.1008215. eCollection 2020 Oct.

Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision

Affiliations

Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision

Courtney J Spoerer et al. PLoS Comput Biol. .

Abstract

Deep feedforward neural network models of vision dominate in both computational neuroscience and engineering. The primate visual system, by contrast, contains abundant recurrent connections. Recurrent signal flow enables recycling of limited computational resources over time, and so might boost the performance of a physically finite brain or model. Here we show: (1) Recurrent convolutional neural network models outperform feedforward convolutional models matched in their number of parameters in large-scale visual recognition tasks on natural images. (2) Setting a confidence threshold, at which recurrent computations terminate and a decision is made, enables flexible trading of speed for accuracy. At a given confidence threshold, the model expends more time and energy on images that are harder to recognise, without requiring additional parameters for deeper computations. (3) The recurrent model's reaction time for an image predicts the human reaction time for the same image better than several parameter-matched and state-of-the-art feedforward models. (4) Across confidence thresholds, the recurrent model emulates the behaviour of feedforward control models in that it achieves the same accuracy at approximately the same computational cost (mean number of floating-point operations). However, the recurrent model can be run longer (higher confidence threshold) and then outperforms parameter-matched feedforward comparison models. These results suggest that recurrent connectivity, a hallmark of biological visual systems, may be essential for understanding the accuracy, flexibility, and dynamics of human visual recognition.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematic representation of the parameter-matched networks.
White boxes represent convolutional layers, with the width representing the spatial dimensions of the convolutional layers and the height representing the number of feature maps. Models were matched in the number of parameters by increasing (1) the size of the convolutional kernels (B-K), (2) the number of feature maps (B-F), and (3) the depth of the network (B-D). Example units (black dots) are linked to coloured regions representing their input kernels (which differ in width in B-K). The extents are illustrative and not drawn to scale.
Fig 2
Fig 2. ImageNet and ecoset task performance for rCNN and parameter-matched controls.
Our rCNN model (red) achieves higher validation accuracy than parameter-matched control models (shades of blue). (A) Training and validation accuracies across training epochs for all networks (top-1). (B) Performance of networks on held-out data using the fully-trained networks. All pairwise differences in model performance were significant (p ≤ 0.05, McNemar test, Bonferroni corrected for all pairwise comparisons).
Fig 3
Fig 3. Validation accuracy as a function of computational cost for feedforward and recurrent models.
Each feedforward model (squares in shades of blue) requires a fixed number of floating-point operations for a single sweep of computation. The top row shows that feedforward models requiring more computation (horizontal axes) had higher top-1 validation accuracy (vertical axes). The recurrent models (yellow-to-red line) could be set to terminate at different levels of confidence, specified as the entropy of the softmax output. For each entropy threshold (colour bar), the computational cost (mean number of floating-point operations) and the top-1 validation accuracy (proportion correct) were computed across the test set. The recurrent models could flexibly trade speed for accuracy (lines in top panels). They achieved the same accuracy as each feedforward control model when given a matched computational budget, and greater accuracy than any of the feedforward models when run longer. The bottom panels replot the data shown in the top panels and additionally show, for a single entropy threshold of the recurrent models, how computational cost varies across images (horizontal domain of the black lines) and what accuracy is achieved at each computational cost. The black line shows the accuracy as a function of computational cost for the selected entropy threshold. The area of each gray circle is proportional to the percentage of images for which the model reaches the entropy threshold at a given computational cost. The open black circle is the average of the points on the black line, weighted by the percentage of images for each computational cost. We see that, at the selected entropy threshold, the model responds rapidly for about half of the images and achieves high performance on these “easy” images. It computes longer for “hard” images, balancing the cost of lower accuracy against the cost of greater expenditure of energy and time.
Fig 4
Fig 4. Human behavioural experiment.
(A) Human subjects were presented with images of isolated objects of different categories and classified the images as animate or inanimate by pressing one of two buttons on each trial. (B) Group-average reaction time for each image. Error bars show the standard error of the mean.
Fig 5
Fig 5. Reaction times from recurrent networks explain human reaction times better than feedforward networks.
Small grey dots represent the Pearson correlation between the network and single subject reaction times. Large dots represent the mean correlation across subjects. Human consistency (black circle) provides a lower bound on the noise ceiling and is computed by correlating reaction times for a single subject with the average reaction time for all other subjects. For each network, multiple sigmoid animacy readouts were placed at even intervals throughout the networks. Animacy readouts were trained to maximise accuracy using a separate set of images not used in the human behavioural experiments. For each model, an entropy threshold was fitted, using independent subjects and images, so that model reaction times best predicted human reaction times (cross-validation).
Fig 6
Fig 6. Relationship between validation accuracy, number of parameters and computational cost across models.
The validation accuracy (vertical axis) is the proportion top-1 correct classifications of the trained models on ImageNet. For each model (coloured disc), the validation accuracy is plotted against the number of parameters (horizontal axis). The area of the coloured discs is proportional to the computational cost as measured by the number of floating point operations required to run the model. The red circles correspond to different numbers of recurrent cycles of computation of the BL recurrent convolutional network. For model abbreviations (B, B-K, B-F, B-D), see Fig 1. B-U is the unrolled control model, with a computational graph matched to BL, but no parameter sharing across cycles of computation.
Fig 7
Fig 7. Lateral-weight components for layer 1 of an rCNN trained on ImageNet.
Every unit receives lateral input from other units within and across feature maps via a local lateral-weight pattern. We used principal component analysis to summarise the lateral-weight patterns. The top five lateral-weight principal components are shown in both their positive (centre right) and negative forms (centre left). Blue shading corresponds to negative values and red to positive. The proportion of variance explained is given beneath each lateral-weight component. Bottom-up feature maps connected by lateral weights with the strongest positive (right) and negative loadings (left) on the weight component are shown alongside. Arrows between bottom-up features indicate the direction of the connection and the loading is given underneath each pair of bottom-up features.
Fig 8
Fig 8. Network unrolling through time.
Unrolling is shown for engineering time (left) and biological time (right). Each box represents a layer and the shading corresponds to its label in engineering time. Connections with the same colour represent shared parameters.
Fig 9
Fig 9. Task performance using varied definitions of predictions for recurrent models.
Accuracies are given for models trained on (A) ImageNet and (B) ecoset using both time-based (left) and threshold-based (right) methods. Accuracies obtained from instantaneous readouts are shown with solid lines and results from cumulative readouts are shown with dashed lines. Shaded areas represent 95% confidence intervals obtained through bootstrap resampling.

References

    1. Wallis G, Rolls ET. Invariant face and object recognition in the visual system. Progress in neurobiology. 1997;51(2):167–194. 10.1016/S0301-0082(96)00054-8 - DOI - PubMed
    1. Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nature neuroscience. 1999;2:1019–1025. 10.1038/14819 - DOI - PubMed
    1. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T. Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2007;29(3):411–426. 10.1109/TPAMI.2007.56 - DOI - PubMed
    1. Kriegeskorte N. Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing. Annual Review of Vision Science. 2015;1(1):417–446. 10.1146/annurev-vision-082114-035447 - DOI - PubMed
    1. Yamins DL, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature neuroscience. 2016;19:356–365. 10.1038/nn.4244 - DOI - PubMed

Publication types