Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 1:254:119121.
doi: 10.1016/j.neuroimage.2022.119121. Epub 2022 Mar 24.

Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity

Affiliations

Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity

Guy Gaziv et al. Neuroimage. .

Abstract

Reconstructing natural images and decoding their semantic category from fMRI brain recordings is challenging. Acquiring sufficient pairs of images and their corresponding fMRI responses, which span the huge space of natural images, is prohibitive. We present a novel self-supervised approach that goes well beyond the scarce paired data, for achieving both: (i) state-of-the art fMRI-to-image reconstruction, and (ii) first-ever large-scale semantic classification from fMRI responses. By imposing cycle consistency between a pair of deep neural networks (from image-to-fMRI & from fMRI-to-image), we train our image reconstruction network on a large number of "unpaired" natural images (images without fMRI recordings) from many novel semantic categories. This enables to adapt our reconstruction network to a very rich semantic coverage without requiring any explicit semantic supervision. Specifically, we find that combining our self-supervised training with high-level perceptual losses, gives rise to new reconstruction & classification capabilities. In particular, this perceptual training enables to classify well fMRIs of never-before-seen semantic classes, without requiring any class labels during training. This gives rise to: (i) Unprecedented image-reconstruction from fMRI of never-before-seen images (evaluated by image metrics and human testing), and (ii) Large-scale semantic classification of categories that were never-before-seen during network training. Such large-scale (1000-way) semantic classification from fMRI recordings has never been demonstrated before. Finally, we provide evidence for the biological consistency of our learned model.

Keywords: Self-Supervised learning, Decoding, Encoding, fMRI, Image reconstruction, Classification; vision.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Our self-supervised approach. (a) The task: reconstructing images and classifying their semantic category from evoked brain activity, recorded via fMRI. (b,c) Supervised training for decoding (b) and encoding (c) using limited training pairs. This gives rise to poor generalization. (d) Illustration of our added self-supervision, which enables training on “unpaired images” (any natural image with no fMRI recording). This self-supervision allows adapting the decoder to the statistics of natural images and many rich semantic classes.
Fig. 2
Fig. 2
Adding unsupervised training on “unpaired images” together with perceptual criteria improves reconstruction.(Left to Right):The images presented to the human subjects.Reconstruction using the training pairs only (Fig1b).Reconstruction when adding self-supervised training on unpaired natural images (Fig1d), and also adding high-level perceptual criteria to the decoder and other important improvements.Our preliminary results (Beliy et al., 2019) without using the perceptual criteria and other important improvements presented here. Example results are shown for two fMRI datasets: ‘fMRI on ImageNet’ (Horikawa and Kamitani, 2017a) and ‘vim-1’ (Kay et al., 2008).
Fig. 3
Fig. 3
Training phases.(a) The first training phase: Supervised training of the Encoder with {Image, fMRI} pairs. (b) Second phase: Training the Decoder with two types of data simultaneously: {fMRI, Image} pairs (supervised examples), and unpaired natural images (self-supervision). The pretrained Encoder from the first training phase is kept fixed in the second phase.
Fig. 4
Fig. 4
Adding high-level perceptual criteria improves reconstruction accuracy and enables large-scale semantic classification.(a) Imposing Perceptual Similarity on the reconstructed image at the Decoder’s output, as applied when training on unpaired natural images (without fMRI and without any class labels) from many novel semantic classes. This adapts the Decoder to a significantly broader semantic space despite not having any explicit semantic supervision. (b)To classify a reconstructed image to its novel semantic class we extract Deep Features using a pretrained classification network, and follow a nearest-neighbor class-centroid approach against a large-scale gallery of 1000+ ImageNet classes. (c)We define class representatives as the mean-embedding of many same-class images (Horikawa and Kamitani, 2017a).
Fig. 5
Fig. 5
Self-supervision allows classification to rich and novel semantic categories.(a)  Visual classification results showing the Top-5 predictions out of 1030 classes for reconstructed test-image. We show examples where the ground-truth class (marked in red) is ranked among the Top-5 (correct classification) or excluded from it (incorrect classification). For visualization purposes only, each class is represented by the nearest-neighbor image from the 100 randomly sampled images of the particular class. Note that ”incorrect” predicted classes are often reasonable (e.g., ”Leopard” wrongly predicted as ”Lion”; ”Duck” wrongly predicted as ”Ostrich”). (b)  Top-1 Classification accuracy in an n-way classification task. Adding unsupervised training on unpaired data (Fig1d,e) dramatically outperforms the baseline of the supervised approach (Fig1b). (c)  Ablation study of the Classification accuracy as a function of the Perceptual Similarity criterion for decoder training: Applying “partial” perceptual similarity using only the outputs of the first VGG16 block (low ”semantic” layers), and up to all its 5 blocks (high ”semantic” layers). Applying full Perceptual Similarity on higher-level VGG features substantially improves classification performance. Panels  a, b show results for subject three.95%Confidence Intervals shown on charts.
Fig. 6
Fig. 6
Encoder & Decoder Architectures.BN, GN, US, and ReLU stand for batch normalization, group normalization, up-sampling, and rectified linear unit, respectively. We designed a custom space-feature locally-connected layer (see text).
Fig. 7
Fig. 7
Reconstructions for all five subjects in “fMRI on ImageNet’ (Horikawa and Kamitani, 2017a). Reconstructed images when using the full method, which includes training on unpaired data (1d,e). Reconstruction quality varies across subjects, depending on noise-ceiling/SNR of subjects” data (voxel median noise-ceiling for subjects 1–5:0.56, 0.57, 0.73, 0.68, 0.58). Subject 3 (in the dataset), which is framed above in red, is the subject of focus in the remaining parts of this paper unless remarked otherwise.
Fig. 8
Fig. 8
Comparison of image-reconstruction with state-of-the-art methods.(a), (b)  Visual comparison with (Shen et al., 2019b;St-Yves and Naselaris, 2019) – each compared on its relevant dataset. Our method reconstructs shapes, details and global layout in images better than the leading methods. (c), (d)  Quantitative comparisons of identification accuracy (per method) in n-way identification task according to Perceptual Similarity metric (see text for details). (e), (f)  n-way identification responses of human raters via Mechanical Turk. Our self-supervised approach significantly outperforms all baseline methods on two datasets and across n-way difficulty levels by both types of experiments – image-metric-based and behavioral human-based (Wilcoxon,N=50,120for panels  (c), (d) ; Mann-Whitney,N=45for panels  (e), (f) ).95%Confidence Intervals by bootstrap shown on charts.
Fig. 9
Fig. 9
Decoding quality is dominated by early visual areas.Columns show reconstructions using our method with fMRI data from various ROIs in the visual cortex including:Primary Visual Cortex – V1Lower Visual Cortex – V1-V3Higher Visual Cortex – Fusiform Face Area (FFA), Parahippocampal Place Area (PPA), Lateral Occipital Cortex (LOC)Full Visual Cortex – LVC + V4 + HVC (in red frame).
Fig. 10
Fig. 10
Our models capture biologically consistent voxel tuning properties.(a)  Receptive field of five selected voxels with high SNR from early visual cortex, which indicates their spatial locality in the image. Panels (b)-(e) show single subject data on the corresponding subject-specific cortical surface. (b) Polar angle. (c) Eccentricity tuning, measured by degree of visual angle (DVA). (d) Noise-corrected prediction accuracy. (e)  Prediction accuracy (non-scaled Pearson correlation). For simplicity we show the data on either left or right hemisphere. Voxel noise-ceiling is coded by transparency level (alpha channel) in all cortical maps.

Similar articles

Cited by

References

    1. Beliy R., Gaziv G., Hoogi A., Strappini F., Golan T., Irani M. Advances in Neural Information Processing Systems. 2019. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI.
    2. https://papers.nips.cc/paper/8879-from-voxels-to-pixels-and-back-self-supervision-in-natural-image-reconstruction-from-fmrihttp://www.wisdom.weizmann.ac.il/~vision/ssfmri2im/

    1. Bonhoeffer T., Grinvald A. Iso-orientation domains in cat visual cortex are arranged in pinwheel-like patterns. Nature. 1991;353(6343):429–431. doi: 10.1038/353429a0. - DOI - PubMed
    2. http://www.nature.com/articles/353429a0

    1. Cichy R.M., Heinzle J., Haynes J.D. Imagery and perception share cortical representations of content and location. Cereb. Cortex. 2012;22(2):372–380. doi: 10.1093/cercor/bhr106. - DOI - PubMed
    2. http://www.ncbi.nlm.nih.gov/pubmed/21666128.

    1. Cox D.D., Savoy R.L. Functional magnetic resonance imaging (fMRI) ”brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage. 2003;19(2 Pt 1):261–270. - PubMed
    2. http://www.ncbi.nlm.nih.gov/pubmed/12814577.

    1. Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009. ImageNet: A large-scale hierarchical image database; pp. 248–255. - DOI
    2. http://ieeexplore.ieee.org/document/5206848/

Publication types

MeSH terms