Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov;26(11):2017-2034.
doi: 10.1038/s41593-023-01442-0. Epub 2023 Oct 16.

Model metamers reveal divergent invariances between biological and artificial neural networks

Affiliations

Model metamers reveal divergent invariances between biological and artificial neural networks

Jenelle Feather et al. Nat Neurosci. 2023 Nov.

Abstract

Deep neural network models of sensory systems are often proposed to learn representational transformations with invariances like those in the brain. To reveal these invariances, we generated 'model metamers', stimuli whose activations within a model stage are matched to those of a natural stimulus. Metamers for state-of-the-art supervised and unsupervised neural network models of vision and audition were often completely unrecognizable to humans when generated from late model stages, suggesting differences between model and human invariances. Targeted model changes improved human recognizability of model metamers but did not eliminate the overall human-model discrepancy. The human recognizability of a model's metamers was well predicted by their recognizability by other models, suggesting that models contain idiosyncratic invariances in addition to those required by the task. Metamer recognizability dissociated from both traditional brain-based benchmarks and adversarial vulnerability, revealing a distinct failure mode of existing sensory models and providing a complementary benchmark for model assessment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of model metamers methodology.
a, Model metamer generation. Metamers are synthesized by performing gradient descent on a noise signal to minimize the difference (normalized Euclidean distance) between its activations at a model stage and those of a natural signal. The architecture shown is the CochCNN9 auditory model. b, Each reference stimulus has an associated set of stimuli that are categorized as the same class by humans (blue) or by models (orange, if models have a classification decision). Metamers for humans and metamers for models are also sets of stimuli in the space of all possible stimuli (subsets of the set of same-class stimuli). Here, model metamers are derived for a specific model stage, taking advantage of access to the internal representations of the model at each stage. c, General experimental setup. Because we do not have high-resolution access to the internal brain representations of humans, we test for shared invariances behaviorally, asking humans to make classification judgments on natural stimuli or model metamers. See text for justification of the use of a classification task. d, Possible scenarios for how model metamers could relate to human classification decisions. Each square depicts sets of stimuli in the input space. Model 1 represents a model that passes our proposed behavioral test. The set of metamers for a reference stimulus grows over the course of the model, but even at the last stage, all model metamers are classified as the reference category by humans. Model 2 represents a model whose invariances diverge from those of humans. By the late stages of the model, many model metamers are no longer recognizable by humans as the reference stimulus class. The metamer test results thus reveal the model stage at which model invariances diverge from those of humans. e, Example distributions of activation similarity for pairs of metamers (a natural reference stimulus and its corresponding metamer) along with random pairs of natural stimuli from the training set. The latter provides a null distribution that we used to verify the success of the model metamer generation. Distributions were generated from the first and last stage of the CochCNN9 auditory model.
Fig. 2
Fig. 2. Metamers of standard-trained visual and auditory deep neural networks are often unrecognizable to human observers.
a, Model metamers are generated from different stages of the model. Here and elsewhere, in models with residual connections, we only generated metamers from stages where all branches converge, which ensured that all subsequent model stages, and the model decision, remained matched. b, Experimental task used to assess human recognition of visual model metamers. Humans were presented with an image (a natural image or a model metamer of a natural image) followed by a noise mask. They were then presented with 16 icons representing 16 object categories and classified each image as belonging to one of the categories by clicking on the icon. c, Human recognition of visual model metamers (N = 22). At the time of the experiments the five models tested here placed 11th, 1st, 2nd, 4th and 59th (left to right) on a neural prediction benchmark,. For all tested models, human recognition of model metamers declined for late model stages, while model recognition remained high (as expected). Error bars plot s.e.m. across participants (or participant-matched stimulus subsets for model curves). d, Human recognition of visual model metamers (N = 21) trained on larger datasets. Error bars plot s.e.m. across participants (or participant-matched stimulus subsets for model curves). e, Example metamers from standard-trained and semi-weakly-supervised-learning (SWSL)-trained ResNet50 visual models. f, Experimental task used to assess human recognition of auditory model metamers. Humans classified the word that was present at the midpoint of a 2-s sound clip. Participants selected from 793 possible words by typing any part of the word into a response box and seeing matching dictionary entries from which to complete their response. A response could only be submitted if it matched an entry in the dictionary. g, Human recognition of auditory model metamers (N = 20). For both tested models, human recognition of model metamers decreased at late model stages, while model recognition remained high, as expected. When plotted, chance performance (1/793) is indistinguishable from the x axis. Error bars plot s.e.m. across participants (or participant-matched stimulus subsets for model curves). h, Cochleagram visualizations of example auditory model metamers from CochCNN9 and CochResNet50 architectures. Color intensity denotes instantaneous sound amplitude in a frequency channel (arbitrary units).
Fig. 3
Fig. 3. Model metamers are unrecognizable to humans even with alternative training procedures.
a, Overview of self-supervised learning, inspired by Chen et al.. Each input was passed through a learnable convolutional neural network (CNN) backbone and a multi-layer perceptron (MLP) to generate an embedding vector. Models were trained to map multiple views of the same image to nearby points in the embedding space. Three of the self-supervised models (SimCLR, MoCo_V2 and BYOL) used a ResNet50 backbone. The other self-supervised model (IPCL) had an AlexNet architecture modified to use group normalization. In both cases, we tested comparison supervised models with the same architecture. The SimCLR, MoCo_V2 and IPCL models also had an additional training objective that explicitly pushed apart embeddings from different images. b, Example metamers from select stages of ResNet50 supervised and self-supervised models. In all models, late-stage metamers were mostly unrecognizable. c, Human recognition of metamers from supervised and self-supervised models (left; N = 21) along with classification performance of a linear readout trained on the ImageNet1K task at each stage of the models (right). Readout classifiers were trained without changing any of the model weights. For self-supervised models, model metamers from the ‘final’ stage were generated from a linear classifier at the avgpool stage. Model recognition curves of model metamers were close to ceiling, as in Fig. 2, and are omitted here and in later figures for brevity. Here and in d, error bars plot s.e.m. across participants (left) or across three random seeds of model evaluations (right). d, Same as c but for the IPCL self-supervised model and supervised comparison with the same dataset augmentations (N = 23). e, Examples of natural and stylized images using the Stylized ImageNet augmentation. Training models on Stylized ImageNet was previously shown to reduce a model’s dependence on texture cues for classification. f, Human recognition of model metamers for ResNet50 and AlexNet architectures trained with Stylized ImageNet (N = 21). Removing the texture bias of models by training on Stylized ImageNet does not result in more recognizable model metamers than the standard model. Error bars plot s.e.m. across participants.
Fig. 4
Fig. 4. Adversarial training increases human recognizability of visual model metamers.
a, Adversarial examples are derived at each training step by finding an additive perturbation to the input that moves the classification label away from the training label class (top). These derived adversarial examples are provided to the model as training examples and used to update the model parameters (bottom). The resulting model is subsequently more robust to adversarial perturbations than if standard training was used. As a control experiment, we also trained models with random perturbations to the input. b, Human recognition of metamers from ResNet50 models (N = 20) trained with and without adversarial or random perturbations. Here and in c and e, error bars plot s.e.m. across participants. c, Same as b but for AlexNet models (N = 20). In both architectures, adversarial training led to more recognizable metamers at deep model stages (repeated measures analysis of variance (ANOVA) tests comparing human recognition of standard and each adversarial model; significant main effects in each case, F1,19 > 104.61, P < 0.0001, ηp2>0.85), although in both cases, the metamers remain less than fully recognizable. Random perturbations did not produce the same effect (repeated measures ANOVAs comparing random to adversarial; significant main effect of random versus adversarial for each perturbation of the same type and size, F1,19 > 121.38, P < 0.0001, ηp2>0.86). d, Example visual metamers for models trained with and without adversarial or random perturbations. e, Recognizability of model metamers from standard-trained models with and without regularization compared to that for an adversarially trained model (N = 20). Two regularization terms were included in the optimization: a total variation regularizer to promote smoothness and a constraint to stay within the image range. Two optimization step sizes were evaluated. Smoothness priors increased recognizability for the standard model (repeated measures ANOVAs comparing human recognition of metamers with and without regularization; significant main effects for each step size, F1,19 > 131.8246, P < 0.0001, ηp2>0.87). However, regularized metamers remained less recognizable than those from the adversarially trained model (repeated measures ANOVAs comparing standard model metamers with regularization to metamers from adversarially trained models; significant main effects for each step size, F1,19 > 80.8186, P < 0.0001, ηp2>0.81). f, Example metamers for adversarially trained and standard models with and without regularization.
Fig. 5
Fig. 5. Adversarial training increases human recognition of auditory model metamers.
a, Schematic of auditory CNNs with adversarial perturbations applied to the waveform input. b,c, Human recognition of auditory model metamers from CochResNet50 (b; N = 20) and CochCNN9 (c; N = 20) models with adversarial perturbations generated in the waveform space (models trained with random perturbations are also included for comparison). When plotted here and in f and g, chance performance (1/793) is indistinguishable from the x axis, and error bars plot s.e.m. across participants. L2 (ε = 0.5) waveform adversaries were only included in the CochResNet50 experiment. ANOVAs comparing standard and each adversarial model showed significant main effects in four of five cases (F1,19 > 9.26, P < 0.0075, ηp2>0.33) and no significant main effect for CochResNet50 with L2 (ε = 1) perturbations (F1,19 = 0.29, P = 0.59, ηp2=0.015). Models trained with random perturbations did not show the same benefit (ANOVAs comparing each random and adversarial perturbation model with the same ε type and size; significant main effect in each case (F1,19 > 4.76, P < 0.0444, ηp2>0.20)). d, Cochleagrams of example model metamers from CochResNet50 models trained with waveform and cochleagram adversarial perturbations. e, Schematic of auditory CNNs with adversarial perturbations applied to the cochleagram stage. f,g, Human recognition of auditory model metamers from models trained with cochleagram adversarial perturbations are more recognizable for CochResNet50 (f) and CochCNN9 (g) models than those from models trained with waveform perturbations. ANOVAs comparing each model trained with cochleagram perturbations versus the same architecture trained with waveform perturbations showed significant main effects in each case (F1,19 > 4.6, P < 0.04, ηp2>0.19). ANOVAs comparing each model trained with cochleagram perturbations to the standard model showed a significant main effect in each case (F1,19 > 102.25, P < 0.0001, ηp2>0.84). The effect on metamer recognition was again specific to adversarial perturbations (ANOVAs comparing effect of training with adversarial versus random perturbations with the same ε type and size; F1,19 > 145.07, P < 0.0001, ηp2>0.88).
Fig. 6
Fig. 6. Human recognition of model metamers dissociates from adversarial robustness.
a, Adversarial robustness of visual models versus human recognizability of final-stage model metamers (N = 26 models). Robustness was quantified as average robustness to L2 (ε = 3) and L (ε = 4/255) adversarial examples, normalized by performance on natural images. Symbols follow those in Fig. 7a. Here and in b and c, error bars for abscissa represent s.e.m. across 5 random samples of 1,024 test examples, and error bars for ordinate represent s.e.m. across participants. b, Same as a but for final convolutional stage (CochCNN9) or block (CochResNet50) of auditory models (N = 17 models). Robustness was quantified as average robustness to L2 (ε = 10−0.5) and L (ε = 10−2.5) adversarial perturbations of the waveform, normalized by performance on natural audio. Symbols follow those in Fig. 7c. c, Adversarial robustness of a set of adversarially trained visual models versus human recognizability of final-stage model metamers (N = 25 models). d, Operations included in the AlexNet architecture to more closely obey the sampling theorem (the resulting model is referred to as ‘LowpassAlexNet’). e, Schematic of VOneAlexNet. f, Adversarial vulnerability as assessed via accuracy on a 1,000-way ImageNet classification task with adversarial perturbations of different types and sizes. LowpassAlexNet and VOneAlexNet were equally robust to adversarial perturbations (F1,8 < 4.5, P > 0.1 and ηp2<0.36 for all perturbation types), and both exhibited greater robustness than the standard model (F1,8 > 137.4, P < 0.031 and ηp2>0.94 for all adversarial perturbation types for both architectures). Error bars plot s.e.m. across 5 random samples of 1,024 test images. g, Human recognition of model metamers from LowpassAlexNet, VOneAlexNet and standard AlexNet models. LowpassAlexNet had more recognizable metamers than VOneAlexNet (main effect of architecture: F1,19 = 71.7, P < 0.0001, ηp2>0.79; interaction of architecture and model stage: F8,152 = 21.8, P < 0.0001, ηp2>0.53). Error bars plot s.e.m. across participants (N = 20). h, Example model metamers from the experiment in d. i, Schematic depiction of how adversarial vulnerability could dissociate from human recognizability of metamers. j, Example augmentations applied to images in tests of out-of-distribution robustness. k, Scatter plot of out-of-distribution robustness versus human recognizability of final-stage model metamers (N = 26 models). Models with large-scale training are denoted with ★ symbols. Other symbols follow those in Fig. 7a; the abscissa value is a single number, and error bars for ordinate represent s.e.m. across participants.
Fig. 7
Fig. 7. Human recognition of model metamers dissociates from model predictions of brain responses.
a, Procedure for neural benchmarks; ANN, artificial neural network. b, Human recognizability of a model’s metamers versus model–brain similarity for four areas of the ventral stream assessed by a commonly used set of benchmarks,. The benchmarks mostly consisted of the neurophysiological variance explained by model features via regression. A single model stage was chosen for each model and brain region that produced highest similarity in a separate dataset; graphs plot results for this stage (N = 26 models). Error bars on each data point plot s.e.m. across participant metamer recognition; benchmark results are a single number. None of the correlations were significant after Bonferroni correction. Given the split-half reliability of the metamer recognizability scores and the model–brain similarity scores, the noise ceiling of the correlation was ρ = 0.92 for IT. c, Procedure for auditory brain predictions. Time-averaged unit responses in each model stage were used to predict each voxel’s response to natural sounds using a linear mapping fit to the responses to a subset of the sounds with ridge regression. Model predictions were evaluated on held-out sounds. d, Average voxel response variance explained by the best-predicting stage of each auditory model from Figs. 2 and 5 plotted against metamer recognizability for that model stage obtained from the associated experiment. We performed this analysis across all voxels in the auditory cortex (left) and within four auditory functional ROIs (right). Variance explained (R2) was measured for the best-predicting stage of the models (N = 17 models) chosen individually for each participant and ROI (N = 8 participants). For each participant, the other participants’ data were used to choose the best-predicting stage. Error bars on each data point plot s.e.m. of metamer recognition and variance explained across participants. No correlations were significant after Bonferroni correction, and they were again below the noise ceiling (the presumptive noise ceiling ranged from ρ = 0.78 to ρ = 0.87 depending on the ROI).
Fig. 8
Fig. 8. Human recognition of a model’s metamers is correlated with their recognition by other models.
a, Model metamers were generated for each stage of a ‘generation’ model (one of the models from Figs. 2c,d, 3, 4 and 6g for visual models and from Figs. 2f and 5 for auditory models). These metamers were presented to ‘recognition’ models (all other models from the listed figures). We measured recognition of the generating model’s metamers by each recognition model, averaging accuracy over all recognition models (excluding the generation model), as shown here for a standard-trained ResNet50 image model. Error bars represent s.e.m. over N = 28 recognition models. b, Average model recognition of metamers from the standard ResNet50, the three self-supervised ResNet50 models and the three adversarially trained ResNet50 models. To obtain self-supervised and adversarially trained results, we averaged each recognition model’s accuracy curve across all generating models and averaged these curves across recognition models. Error bars represent s.e.m. over N = 28 recognition models for standard models and N = 29 recognition models for adversarially trained and self-supervised models. c, Same as b but for Standard AlexNet, LowpassAlexNet and VOneAlexNet models from Fig. 6d–h. Error bars are over N = 28 recognition models. d, Same as b but for auditory models, with metamers generated from the standard CochResNet50, the three CochResNet50 models with waveform adversarial perturbations and the two CochResNet50 models with cochleagram adversarial perturbations. Chance performance is 1/794 for models because they had a ‘null’ class in addition to 793 word labels. Error bars represent s.e.m. over N = 16 recognition models for the standard model and N = 17 recognition models for adversarially trained models. e,f, Correlation between human and model recognition of another model’s metamers for visual (e; N = 219 model stages) and auditory (f; N = 144 model stages) models. Abscissa plots average human recognition accuracy of metamers generated from one stage of a model, and error bars represent s.e.m. across participants. Ordinate plots average recognition by other models of those metamers, and error bars represent s.e.m. across recognition models. Human recognition of a model’s metamers is highly correlated with other models’ recognition of those same model metamers.
Extended Data Fig. 1
Extended Data Fig. 1. Model metamers generated from different noise initializations.
a,b, Model metamers generated from four different white noise initializations for the Standard ResNet50 (a) and an adversarially trained ResNet50 (b).
Extended Data Fig. 2
Extended Data Fig. 2. Analysis of consistency of human recognition errors for model metamers.
a, 16-way confusion matrix for natural images. Here and in b and c, results incorporate human responses from all experiments that contained the AlexNet Standard architecture or the ResNet50 Standard architecture (N = 104 participants). Statistical test for confusion matrix described in b. b, Confusion matrices for human recognition judgments of model metamers from each stage of the AlexNet and ResNet50 models (using data from all experiments that contained the AlexNet Standard architecture or the ResNet50 Standard architecture). We performed a split-half reliability analysis of the confusion matrices to determine whether the confusions were reliable across participants. We measured the correlation between the confusion matrices for splits of human participants, and assessed whether this correlation was significantly greater than 0 (one-sided test). P-values from this analysis are given above each confusion matrix. For the later stages of each model, the confusion matrices are no more consistent than would be expected by chance, consistent with the metamers being completely unrecognizable (that is, containing no information about the visual category of the natural image they are matched to). c, Human recognizability of model metamers from different stages of AlexNet (N = 63 participants) and ResNet50 models (N = 84 participants). Error bars are s.e.m. across participants. Stages whose confusions were not consistent across splits of human observers are noted by the shaded region. The stages for which recognition is near chance show inconsistent confusion patterns, ruling out the possibility that the chance levels of recognition are driven by systematic errors (for example consistently recognizing metamers for cats as dogs).
Extended Data Fig. 3
Extended Data Fig. 3. Optimization fidelity vs. human recognition of model metamers.
a,b, Optimization fidelity for visual model metamers at the metamer generation stage (a) and at the final model stage corresponding to a categorization decision (b; N = 219 model stages). Visual models are those in Figs. 2–4 and Fig. 6g. Note that most data points are very close to 1 for the final stage correlation metrics (for example 209/219 stages exceed an average Spearman ρ of 0.99). Each point corresponds to a single stage of a single model. c,d, Optimization fidelity for auditory model metamers at the metamer generation stage (c) and at the final model stage corresponding to a categorization decision (d). Auditory models are those in Fig. 2 and Fig. 5 (N = 127 model stages). Note that most data points are again very close to 1 for the final stage correlation metrics (all 127 stages exceed an average Spearman ρ of 0.99). In all cases, optimization fidelity is high for both the metamer generation stage and the final stage, and human recognition is not predicted by the optimization fidelity, or only weakly correlated with the optimization fidelity for the generated model metamers (accounting for a very small fraction of the variance). Error bars on each data point are standard deviation across generated model metamers that passed the optimization criteria to be included in the psychophysical experiment. e,f, Final stage optimization fidelity plotted vs. model metamer generation stage for CochCNN9 auditory model (e) and CochResNet50 auditory model (f). Note the y axis limits, which differ across plots to show the small variations near 1 for the correlation measures. It is apparent that for any given model, some stages are somewhat less well optimized than others, but these variations do not account for the recognizability differences found in our experiments (compare these plots to the recognition plots in Figs. 2 and 5).
Extended Data Fig. 4
Extended Data Fig. 4. Metamers from a classic vision model.
a, Schematic of HMAX vision model. The HMAX vision model is a biologically-motivated architecture with cascaded filtering and pooling operations inspired by simple and complex cells in the primate visual system and was intended to capture aspects of biological object recognition,. We generated model metamers by matching all units at the S1, C1, S2, or C2 stage of the model. b, Example HMAX model metamers. c, Although HMAX is substantially shallower than the “deep” neural network models investigated in the rest of this paper, it is evident that by the C2 model stage its model metamers are comparably unrecognizable to humans (significant main effect of model stage, F(4,76) = 351.9, p < 0.0001, ηp2=0.95). This classical model thus also has invariances that differ from those of the human object recognition system. Error bars plot s.e.m. across participants (N = 20). HMAX metamers were black and white, while all metamers from all other models were in color.
Extended Data Fig. 5
Extended Data Fig. 5. Metamers from a classic auditory model.
a, Schematic of spectro-temporal auditory filterbank model (Spectemp), a classical model of auditory cortex consisting of a set of spectrotemporal filters applied to a cochleagram representation. We used a version of the model in which the convolutional responses are summarized with the mean power in each filter,,. b, Cochleagrams of example Spectemp model metamers. c, Human recognition of Spectemp model metamers. Metamers from the first two stages were fully recognizable, and subjectively resembled the original audio, indicating that these stages instantiate few invariances, as expected for overcomplete filter-bank decompositions. By contrast, metamers from the temporal average representation were unrecognizable (significant main effect of model stage F(4,76) = 515.3, p < 0.0001, ηp2=0.96), indicating that this model stage produces invariances that humans do not share (plausibly because the human speech recognition system retains information that is lost in the averaging stage). Error bars plot s.e.m. across participants (N = 20). Overall, these results and those in Extended Data Fig. 4 show how metamers can reveal the invariances present in classical models as well as state-of-the-art deep neural networks, and demonstrate that both types of models fail to fully capture the invariances of biological sensory systems.
Extended Data Fig. 6
Extended Data Fig. 6. Consistency of human recognition of individual metamers from models trained with random or adversarial perturbations.
a, Consistency of recognizability of individual stimuli across splits of participants. Graph plots the proportion correct for individual stimuli for one random split. Circle size represents the number of stimuli at that particular value. Correlation values were determined by averaging the Spearman ρ over 1000 random splits of participants (p-values were computed non-parametrically by shuffling participant responses for each condition, and computing the number of times the true Spearman ρ averaged across splits was lower than the shuffled correlation value; one-sided test). We only included images that had at least 4 trials in each split of participants, and we only included 4 trials in the average, to avoid having some images exert more influence on the result than others (resulting in quantized values for proportion correct). Recognizability of individual stimuli was reliable for natural images, relu2 of AlexNet trained with random perturbation training and the final stage of adversarially trained AlexNet (p < 0.001 in each case). By contrast, recognizability of individual metamers from the final stage of AlexNet trained with random perturbations showed no consistency across participants (p = 0.485).
Extended Data Fig. 7
Extended Data Fig. 7. Analysis of most and least recognizable visual model metamers.
a, Using half of the N = 40 participants, we selected the 50 images with the highest recognizability and the 50 images with the lowest recognizability (“train” split). We then measured the recognizability for these most and least recognizable images in the other half of participants (“test” split). We analyzed 1000 random participant splits; p-values were computed by measuring the number of times the “most” recognizable images had higher recognizability than the “least” recognizable images (one-sided test). Graph shows violin plots of test split results for the 1000 splits. Images were only included in analysis if they had responses from at least 4 participants in each split. The difference between the most and least recognizable metamers replicated across splits for the model stages with above-chance recognizability (p < 0.001), indicating that human observers agree on which metamers are recognizable (but not for the final stage of AlexNet trained with random perturbations, p = 0.165). Box plots within violins are defined with a center dot as the median value, bounds of box as the 25th-75th percentile, and the whiskers as the 1.5x interquartile range. b,c, Example model metamers from the 50 “most” and “least” recognizable metamers for the final stage of adversarially trained AlexNet (b) and for the relu2 stage of AlexNet trained with random perturbations (evaluated with data from all participants; c). All images shown had at least 8 responses across participants for both the natural image and model metamer condition, had 100% correct responses for the natural image condition, and had 100% correct (for “most” recognizable images) or 0% correct (for “least” recognizable images).
Extended Data Fig. 8
Extended Data Fig. 8. Examples of model metamers generated with regularization.
Metamers were generated with terms for smoothness and image range included in the loss function. Three coefficients for the smoothness regularizer were used. Red outlines are present on conditions that were used in the human classification experiment (chosen to maximize the recognizability, and to match the choices in the original paper that introduced this type of regularization).
Extended Data Fig. 9
Extended Data Fig. 9. Adversarial robustness of VOneAlexNet and LowPassAlexNet to different types of adversarial examples.
a, Adversarial robustness to “Fooling images”. Fooling images are constructed from random noise initializations (the same noise type used for initialization during model metamer generation) by making small Lp-constrained perturbations to cause the model to classify the noise as a particular target class. LowpassAlexNet and VOneAlexNet are more robust than the standard AlexNet for all perturbation types (ANOVA comparing VOneAlexNet or LowPassAlexNet to the standard architecture; main effect of architecture; F(1,8) > 6787.0, p < 0.031, ηp2>0.999, for all adversarial perturbation types in both cases), and although there was a significant robustness difference between LowPassAlexNet and VOneAlexNet, it was in the opposite direction as the difference in metamer recognizability: VOneAlexNet was more robust (ANOVA comparing VOneAlexNet to LowPassAlexNet; main effect of architecture; F(1,8) > 98.6, p < 0.031, ηp2>0.924 for all perturbation types). Error bars plot s.e.m. across five sets of target labels. b, Adversarial robustness to feature adversaries. Feature adversaries are constructed by perturbing a natural “source” image so that it yields model activations (at a particular stage) that are close to those evoked by a different natural “target” image, while constraining the perturbed image to remain within a small distance from the original natural image in pixel space. The robustness measure plotted here is averaged across adversaries generated for all stages of a model. LowpassAlexNet and VOneAlexNet were more robust than the standard AlexNet for all perturbation types (ANOVA comparing VOneAlexNet or LowPassAlexNet to the standard architecture; main effect of architecture F(1,8) > 90.8, p < 0.031, ηp2>0.919, for all adversarial perturbation types), and although there was a significant robustness difference between LowPassAlexNet and VOneAlexNet, it was again in the opposite direction as the difference in metamer recognizability: VOneAlexNet was more robust (ANOVA comparing VOneAlexNet to LowPassAlexNet; main effect of architecture; F(1,8) > 69.0, p < 0.031, ηp2>0.895 for all perturbation types). Here and in (c), error bars plot s.e.m. across five samples of target and source images. c, Performance on feature adversaries for each model stage used to obtain the average curve in b. LowpassAlexNet is not more robust than VOneAlexNet to any type of adversarial example, even though it has more recognizable model metamers (Fig. 6g), illustrating that metamers reveal a different type of model discrepancy than that revealed with typical metrics of adversarial robustness.
Extended Data Fig. 10
Extended Data Fig. 10. Representational Similarity Analysis of auditory fMRI data.
The median Spearman ρ between the RDM from fMRI activations to natural sounds (N = 8 participants) and the RDM from model activations at the best model stage as determined with held-out data, compared with the metamer recognition by humans at this chosen model stage. The dashed black line shows the upper bound on the RDM similarity that could be measured given the data reliability, estimated by comparing a participant’s RDM with the average of the RDMs from each of the other participants. Error bars are s.e.m. across participants. The correlation between metamer recognizability and the human-model RDM similarity was not statistically significant for any of the ROIs following Bonferroni correction (all: ρ = 0.02, p = 1.0; tonotopic: ρ = 0.60, p = .06; pitch: ρ = 0.06, p = 1.0; music: ρ = 0.10, p = 1.0; speech: ρ = 0.12, p = 1.0), and was again well below the presumptive noise ceiling (which ranged from ρ = 0.79 to ρ = 0.89, depending on the ROI). We also note that the variation in metamer recognizability across models is substantially greater than the variation in RDM similarity, indicating that metamers better differentiate this set of models than does the RDM similarity with this fMRI dataset.

References

    1. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex. 1991;1:1–47. doi: 10.1093/cercor/1.1.1. - DOI - PubMed
    1. Fukushima K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980;36:193–202. doi: 10.1007/BF00344251. - DOI - PubMed
    1. Serre T, Oliva A, Poggio T. A feedforward architecture accounts for rapid categorization. Proc. Natl Acad. Sci. USA. 2007;104:6424–6429. doi: 10.1073/pnas.0700622104. - DOI - PMC - PubMed
    1. Kell AJE, Yamins DLK, Shook EN, Norman-Haignere SV, McDermott JH. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron. 2018;98:630–644. doi: 10.1016/j.neuron.2018.03.044. - DOI - PubMed
    1. Kriegeskorte N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vis. Sci. 2015;1:417–446. doi: 10.1146/annurev-vision-082114-035447. - DOI - PubMed