Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 28:13:RP98351.
doi: 10.7554/eLife.98351.

An image-computable model of speeded decision-making

Affiliations

An image-computable model of speeded decision-making

Paul I Jaffe et al. Elife. .

Abstract

Evidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects in a unified Bayesian framework. Models fitted to large-scale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.

Keywords: decision-making; neural network modelling; neuroscience; none; visual processing.

PubMed Disclaimer

Conflict of interest statement

PJ, GS, RS, PB, RP No competing interests declared

Figures

Figure 1.
Figure 1.. Task and model.
(A) Top, Lost in Migration task. Bottom, the seven stimulus layouts (random target/flanker directions). (B) Visual accumulator model (VAM) schematic. The numbers after the convolutional neural network (CNN) layer names correspond to the number of channels used in that layer. See Methods for additional details.
Figure 2.
Figure 2.. Comparison of model/participant behavior.
For panels B–E, each point is one model/participant (n=75), black line: unity, red line: linear best fit. (A) Example model/participant response time (RT) distributions. (B) Mean RT (Pearson’s r = 0.99, bootstrap 95% CI = (0.99, 0.99), best fit slope = 1.07). (C) Accuracy (r=0.91, 95% CI = (0.87, 0.94), slope = 1.15). (D) RT congruency effect (r=0.77, 95% CI = (0.67, 0.86), slope = 1.01). (E) Accuracy congruency effect (r=0.92, 95% CI = (0.88, 0.94), slope = 1.20). (F) Drift rates averaged across all trials and models. (G) Mean RT vs. age averaged across models. (H) Example model/participant mean RT vs. stimulus layout (Pearson’s r = 0.67). (I) Example model/participant mean RT vs. horizontal stimulus position (negative values: left of center; Pearson’s r = 0.79). (J) Empirical CDF of Pearson’s r between model/participant mean RTs across stimulus feature bins (only participants with significant RT modulation are shown; layout: n = 60 models/participants, x-position: n = 72, y-position: n = 69). Error bars in panels (F-I) are bootstrap 95% confidence intervals.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Example model/participant response time (RT) distributions and dependence of RTs on stimulus features.
(A) Example model/participant RT distributions (all trials). (B) Examples of model/participant mean RT vs. stimulus layout. (C) Examples of model/participant mean RT vs. horizontal stimulus position (negative values: left of center). (D) Examples of model/participant mean RT vs. vertical stimulus position (negative values: above center). For all panels, error bars correspond to bootstrap 95% confidence intervals.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Age dependence of linear ballistic accumulator (LBA) parameters.
For all panels, we tested age-dependence with a one-way ANOVA and report Bonferroni-adjusted p-values, corrected for four comparisons (n=75 models). We also report adjusted p-values from a post-hoc comparison of the 20–29 vs. 70–89 age groups conducted with Tukey’s HSD. Error bars correspond to bootstrap 95% confidence intervals. (A) Non-decision time parameter t0 (F(5, 69)=13.3, p < 1e-7). Tukey’s HSD for 20–29 vs. 70–89 age groups: p < 1e-8. (B) Response caution (bA); F(5, 69) = 0.49, p = 1.0. (C) Mean target drift rate (F(5, 69) = 3.4, p = 0.026). Tukey’s HSD for 20–29 vs. 70–89 age groups: p = 0.002. (D) Mean flanker drift rate (F(5, 69) = 0.72, p = 1.0).
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. Dependence of response times (RTs) on stimulus layout and position.
For each participant/model, we calculated the mean RT in each stimulus feature bin, then subtracted the average of these mean RTs from each bin. The panels show the average of these centered RTs across all participants with significant modulation of RT for that particular stimulus feature. For all panels, we conducted a one-way ANOVA for both models/participants and report Bonferroni-adjusted p-values, corrected for three comparisons (n = 75 models). We also report results from post-hoc comparisons between select feature bins conducted with Tukey’s HSD. Error bars correspond to bootstrap 95% confidence intervals. (A) RT vs. stimulus layout (models: F(6, 53) = 7.43, p < 1e-6, RTs for the vertical line layout were significantly faster (Tukey’s HSD adjusted p-value < 0.05) than RTs from all other layouts except ’>’; participants: F(6, 53) = 23.1, p < 1e-22; RTs for the vertical line layout were significantly faster than RTs from all other layouts). (B) RT vs. horizontal stimulus position (negative values: left of center; models: F(7, 64) = 16.8, p < 1e-18, RTs for the leftmost and rightmost position bins were significantly slower than RTs from all intermediate position bins; participants: F(7, 64) = 72.6, <p1e-73; RTs for the leftmost and rightmost position bins were significantly slower than RTs from all intermediate position bins). (C) RT vs. vertical stimulus position (negative values: above center; models: F(5, 66)=17.2,p < 1e-14, RTs for the topmost and bottommost position bins were significantly slower than RTs from the two centermost position bins; participants: F(5, 66)=113.3,<p1e-74; RTs for the topmost and bottommost position bins were significantly slower than RTs from the two centermost position bins).
Figure 2—figure supplement 4.
Figure 2—figure supplement 4.. Response time (RT) delta plots and conditional accuracy functions.
(A) RT delta plots for participants and visual accumulator models (VAMs) (n = 75 models/participants). (B) Conditional accuracy functions for participants and VAMs. For all panels, error bars correspond to bootstrap 95% confidence intervals.
Figure 3.
Figure 3.. Neural representations of target direction.
(A) Schematic of the convolutional neural network (CNN) activations extracted from each network layer. Each layer yields a N×Kl activation matrix, where N is the number of stimuli and Kl is the number of active units (i.e. feature dimensions) in layer l. (B) Decoding accuracy of stimulus target direction. (C) Normalized mutual information for target direction conveyed by single units, averaged across units. Mutual information was normalized by the entropy of the target direction distribution (possible range = [0, 1]). (D) Dimensionality of target representations as measured by the participation ratio of the target-centered activation covariance matrix. (E) Proportion of units exhibiting selectivity for target direction. Panels B-E show the average across n = 75 models; error bars correspond to bootstrap 95% confidence intervals.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Activity of all selective (+) units for one example model.
Each row shows the activity of one unit for 100 randomly selected stimuli, sorted by target direction. The activity of each unit was centered and normalized by the activity of the stimulus with the largest magnitude activation. The small number of selective (+) units in layer Conv1 are not shown.
Figure 4.
Figure 4.. Suppression of task-irrelevant information and tolerance in task-relevant representations.
(A) Decoding accuracy of stimulus target direction in a new distracter context (generalization performance). Context was defined by the values of a given stimulus feature (flanker direction, layout, horizontal/vertical position). (B) Decoding accuracy of irrelevant stimulus features. (C) Normalized mutual information for irrelevant stimulus features conveyed by single units, averaged across units. For each stimulus feature, the mutual information was normalized by the entropy of the stimulus feature distribution (possible range = [0, 1]). All panels show the average across n = 75 models; error bars correspond to bootstrap 95% confidence intervals.
Figure 5.
Figure 5.. Orthogonality of target/flanker subspaces predicts accuracy congruency effects.
(A) Target/flanker subspace alignment averaged across models. (B) Pearson’s correlation coefficient between target/flanker subspace alignment and accuracy congruency effect calculated across models. (C) Target/flanker subspace alignment vs. accuracy congruency effect for layers Conv4–FC1. Each point corresponds to one model; the red line is the linear best fit. For all panels, n = 75 models. Error bars in panels A–B correspond to bootstrap 95% confidence intervals. Asterisks in panel B indicate a significant Pearson’s r (adjusted p-value<0.05, permutation test with n = 1000 shuffles, Bonferroni correction for seven comparisons).
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Absence of correlation between flanker suppression metrics and congruency effects.
All panels show the Pearson’s correlation coefficient between the specified suppression and behavior metrics, calculated across models. (A) Flanker direction decoding accuracy vs. accuracy congruency effect. (B) Mutual information for flanker direction conveyed by single units vs. accuracy congruency effect. (C) Flanker direction decoding accuracy vs. response time (RT) congruency effect. (D) Mutual information for flanker direction conveyed by single units vs. RT congruency effect. For all panels, n = 75 models, error bars correspond to bootstrap 95% confidence intervals. Asterisks indicate a significant Pearson’s r (adjusted p-value < 0.05, permutation test with n = 1000 shuffles, Bonferroni correction for seven comparisons).
Figure 5—figure supplement 2.
Figure 5—figure supplement 2.. Absence of correlation between target/flanker subspace alignment and response time (RT) congruency effect.
Pearson’s correlation coefficient between target/flanker subspace alignment and RT congruency effect across models (n = 75 models, error bars correspond to bootstrap 95% confidence intervals). The correlation was not significant for any layer (adjusted p-value > 0.05, permutation test with n = 1000 shuffles, Bonferroni correction for seven comparisons).
Figure 6.
Figure 6.. Comparison of visual accumulator models (VAMs) and task-optimized models.
(A) Accuracy congruency effect. (B) Target/flanker subspace alignment. (C) Dimensionality of target representations, as measured by the participation ratio of the target-centered activation covariance matrix. (D) Normalized mutual information for target/flanker direction conveyed by single units, averaged across units. Mutual information was normalized by the entropy of the target/flanker direction distribution (possible range = [0, 1]). (E) Decoding accuracy of target/flanker direction. (F) Proportion of units exhibiting selectivity for target direction in layers Conv5–Conv6. All panels show the average across n = 75 task-optimized models and n = 75 VAMs; error bars correspond to bootstrap 95% confidence intervals. The VAM data shown in panels A–F is the same as that shown in Figures 2E, 5A and 3B-E and Figure 4B, C, respectively.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Additional analysis of visual accumulator models (VAMs) and task-optimized models.
(A) Normalized mutual information for stimulus layout and horizontal/vertical stimulus position conveyed by single units, averaged across units. Mutual information was normalized by the entropy of the corresponding stimulus feature distribution. (B) Decoding accuracy of stimulus layout and horizontal/vertical stimulus position. All panels show the average across n = 75 task-optimized models and n = 75 VAMs; error bars correspond to bootstrap 95% confidence intervals. The VAM data shown in panels A and B is the same as that shown in Figure 4C and B, respectively.

Update of

  • doi: 10.48550/arXiv.2403.16382
  • doi: 10.7554/eLife.98351.1
  • doi: 10.7554/eLife.98351.2

References

    1. Annis J, Gauthier I, Palmeri TJ. Combining convolutional neural networks and cognitive models to predict novel object recognition in humans. Journal of Experimental Psychology. Learning, Memory, and Cognition. 2021;47:785–807. doi: 10.1037/xlm0000968. - DOI - PubMed
    1. Ansuini A, Laio A, H.Macke J, Zoccolan D. Intrinsic Dimension of Data Representations in Deep Neural Networks. Adv Neural Inf Process Systems; 2019.
    1. Baker N, Lu H, Erlikhman G, Kellman PJ. Deep convolutional networks do not classify based on global object shape. PLOS Computational Biology. 2018;14:e1006613. doi: 10.1371/journal.pcbi.1006613. - DOI - PMC - PubMed
    1. Ben-David BM, Eidels A, Donkin C. Effects of aging and distractors on detection of redundant visual targets and capacity: do older adults integrate visual targets differently than younger adults? PLOS ONE. 2014;9:e113551. doi: 10.1371/journal.pone.0113551. - DOI - PMC - PubMed
    1. Bernardi S, Benna MK, Rigotti M, Munuera J, Fusi S, Salzman CD. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell. 2020;183:954–967. doi: 10.1016/j.cell.2020.09.031. - DOI - PMC - PubMed

LinkOut - more resources