. 2022 Jul 1:254:119121.

doi: 10.1016/j.neuroimage.2022.119121. Epub 2022 Mar 24.

Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity

Guy Gaziv¹, Roman Beliy², Niv Granot², Assaf Hoogi², Francesca Strappini³, Tal Golan⁴, Michal Irani⁵

Affiliations

¹ Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel. Electronic address: guy.gaziv@weizmann.ac.il.
² Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel.
³ Dept. of Neurobiology, Weizmann Institute of Science, Rehovot, Israel.
⁴ Zuckerman Institute, Columbia University, New York, NY USA.
⁵ Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel. Electronic address: michal.irani@weizmann.ac.il.

PMID: 35342004
PMCID: PMC9133799
DOI: 10.1016/j.neuroimage.2022.119121

Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity

Guy Gaziv et al. Neuroimage. 2022.

. 2022 Jul 1:254:119121.

doi: 10.1016/j.neuroimage.2022.119121. Epub 2022 Mar 24.

Authors

Guy Gaziv¹, Roman Beliy², Niv Granot², Assaf Hoogi², Francesca Strappini³, Tal Golan⁴, Michal Irani⁵

Affiliations

¹ Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel. Electronic address: guy.gaziv@weizmann.ac.il.
² Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel.
³ Dept. of Neurobiology, Weizmann Institute of Science, Rehovot, Israel.
⁴ Zuckerman Institute, Columbia University, New York, NY USA.
⁵ Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel. Electronic address: michal.irani@weizmann.ac.il.

PMID: 35342004
PMCID: PMC9133799
DOI: 10.1016/j.neuroimage.2022.119121

Abstract

Reconstructing natural images and decoding their semantic category from fMRI brain recordings is challenging. Acquiring sufficient pairs of images and their corresponding fMRI responses, which span the huge space of natural images, is prohibitive. We present a novel self-supervised approach that goes well beyond the scarce paired data, for achieving both: (i) state-of-the art fMRI-to-image reconstruction, and (ii) first-ever large-scale semantic classification from fMRI responses. By imposing cycle consistency between a pair of deep neural networks (from image-to-fMRI & from fMRI-to-image), we train our image reconstruction network on a large number of "unpaired" natural images (images without fMRI recordings) from many novel semantic categories. This enables to adapt our reconstruction network to a very rich semantic coverage without requiring any explicit semantic supervision. Specifically, we find that combining our self-supervised training with high-level perceptual losses, gives rise to new reconstruction & classification capabilities. In particular, this perceptual training enables to classify well fMRIs of never-before-seen semantic classes, without requiring any class labels during training. This gives rise to: (i) Unprecedented image-reconstruction from fMRI of never-before-seen images (evaluated by image metrics and human testing), and (ii) Large-scale semantic classification of categories that were never-before-seen during network training. Such large-scale (1000-way) semantic classification from fMRI recordings has never been demonstrated before. Finally, we provide evidence for the biological consistency of our learned model.

Keywords: Self-Supervised learning, Decoding, Encoding, fMRI, Image reconstruction, Classification; vision.

PubMed Disclaimer

Figures

**Fig. 1**
**Our self-supervised approach. (a)** The task: reconstructing images and classifying their semantic category from evoked brain activity, recorded via fMRI. **(b,c)** Supervised training for decoding (b) and encoding (c) using limited training pairs. This gives rise to poor generalization. **(d)** Illustration of our added self-supervision, which enables training on “unpaired images” (any natural image with no fMRI recording). This self-supervision allows adapting the decoder to the statistics of natural images and many rich semantic classes.

**Fig. 2**
**Adding unsupervised training on “unpaired images” together with perceptual criteria improves reconstruction.***(Left to Right):* $•$ *The images presented to the human subjects.* $•$ *Reconstruction using the training pairs only (Fig*1b). $•$ *Reconstruction when adding self-supervised training on unpaired natural images (Fig*1d), and also adding high-level perceptual criteria to the decoder and other important improvements. $•$ *Our preliminary results (*Beliy et al., 2019) without using the perceptual criteria and other important improvements presented here. Example results are shown for two fMRI datasets: ‘fMRI on ImageNet’ (*Horikawa and Kamitani, 2017a) and ‘vim-1’ (*Kay et al., 2008).

**Fig. 3**
**Training phases.****(a)** The first training phase: Supervised training of the Encoder with {Image, fMRI} pairs. **(b)** Second phase: Training the Decoder with two types of data simultaneously: {fMRI, Image} pairs (supervised examples), and unpaired natural images (self-supervision). The pretrained Encoder from the first training phase is kept fixed in the second phase.

**Fig. 4**
**Adding high-level perceptual criteria improves reconstruction accuracy and enables large-scale semantic classification.****(a)** Imposing Perceptual Similarity on the reconstructed image at the Decoder’s output, as applied when training on unpaired natural images (without fMRI and without any class labels) from many novel semantic classes. This adapts the Decoder to a significantly broader semantic space despite not having any explicit semantic supervision. **(b)**To classify a reconstructed image to its novel semantic class we extract Deep Features using a pretrained classification network, and follow a nearest-neighbor class-centroid approach against a large-scale gallery of 1000+ ImageNet classes. **(c)**We define class representatives as the mean-embedding of many same-class images (*Horikawa and Kamitani, 2017a*).

**Fig. 5**
**Self-supervision allows classification to rich and novel semantic categories.****(a)** Visual classification results showing the Top-5 predictions out of 1030 classes for reconstructed test-image. We show examples where the ground-truth class (marked in red) is ranked among the Top-5 **(correct classification)** or excluded from it **(incorrect classification)**. For visualization purposes only, each class is represented by the nearest-neighbor image from the 100 randomly sampled images of the particular class. Note that ”incorrect” predicted classes are often reasonable (e.g., ”Leopard” wrongly predicted as ”Lion”; ”Duck” wrongly predicted as ”Ostrich”). **(b)** Top-1 Classification accuracy in an n-way classification task. Adding unsupervised training on unpaired data (Fig1d,e) dramatically outperforms the baseline of the supervised approach (Fig1b). **(c)** Ablation study of the Classification accuracy as a function of the Perceptual Similarity criterion for decoder training: Applying “partial” perceptual similarity using only the outputs of the first VGG16 block (low ”semantic” layers), and up to all its 5 blocks (high ”semantic” layers). Applying full Perceptual Similarity on higher-level VGG features substantially improves classification performance. Panels **a, b** show results for subject three. $95 %$ *Confidence Intervals shown on charts.*

**Fig. 6**
**Encoder & Decoder Architectures.***BN, GN, US, and ReLU stand for batch normalization, group normalization, up-sampling, and rectified linear unit, respectively. We designed a custom space-feature locally-connected layer (see text).*

**Fig. 7**
**Reconstructions for all five subjects in “fMRI on ImageNet’** (Horikawa and Kamitani, 2017a). *Reconstructed images when using the full method, which includes training on unpaired data (*1d,e). Reconstruction quality varies across subjects, depending on noise-ceiling/SNR of subjects” data (voxel median noise-ceiling for subjects 1–5:0.56, 0.57, 0.73, 0.68, 0.58). Subject 3 (in the dataset), which is framed above in red, is the subject of focus in the remaining parts of this paper unless remarked otherwise.

**Fig. 8**
**Comparison of image-reconstruction with state-of-the-art methods.****(a), (b)** Visual comparison with (Shen et al., 2019b;*St-Yves and Naselaris, 2019*) – each compared on its relevant dataset. Our method reconstructs shapes, details and global layout in images better than the leading methods. **(c), (d)** Quantitative comparisons of identification accuracy (per method) in n-way identification task according to Perceptual Similarity metric (see text for details). **(e), (f)** n-way identification responses of human raters via Mechanical Turk. Our self-supervised approach significantly outperforms all baseline methods on two datasets and across n-way difficulty levels by both types of experiments – image-metric-based and behavioral human-based (Wilcoxon, $N = 50, 120$ *for panels **(c), (d)** ; Mann-Whitney,* $N = 45$ *for panels **(e), (f)** ).* $95 %$ *Confidence Intervals by bootstrap shown on charts.*

**Fig. 9**
**Decoding quality is dominated by early visual areas.***Columns show reconstructions using our method with fMRI data from various ROIs in the visual cortex including:* $•$ **Primary Visual Cortex** – V1 $•$ **Lower Visual Cortex** – V1-V3 $•$ **Higher Visual Cortex** – Fusiform Face Area (FFA), Parahippocampal Place Area (PPA), Lateral Occipital Cortex (LOC) $•$ **Full Visual Cortex** – LVC + V4 + HVC (in red frame).

**Fig. 10**
**Our models capture biologically consistent voxel tuning properties.****(a)** Receptive field of five selected voxels with high SNR from early visual cortex, which indicates their spatial locality in the image. Panels **(b)-(e)** show single subject data on the corresponding subject-specific cortical surface. **(b)** Polar angle. **(c)** Eccentricity tuning, measured by degree of visual angle (DVA). **(d)** Noise-corrected prediction accuracy. **(e)** Prediction accuracy (non-scaled Pearson correlation). For simplicity we show the data on either left or right hemisphere. Voxel noise-ceiling is coded by transparency level (alpha channel) in all cortical maps.

See this image and copyright information in PMC

Cited by

Efficient Neural Decoding Based on Multimodal Training.
Wang Y. Wang Y. Brain Sci. 2024 Sep 28;14(10):988. doi: 10.3390/brainsci14100988. Brain Sci. 2024. PMID: 39452003 Free PMC article.
RECONSTRUCTING RETINAL VISUAL IMAGES FROM 3T FMRI DATA ENHANCED BY UNSUPERVISED LEARNING.
Xiong Y, Zhu W, Lu ZL, Wang Y. Xiong Y, et al. Proc IEEE Int Symp Biomed Imaging. 2024 May;2024:10.1109/isbi56570.2024.10635641. doi: 10.1109/isbi56570.2024.10635641. Epub 2024 Aug 22. Proc IEEE Int Symp Biomed Imaging. 2024. PMID: 39421191 Free PMC article.
Natural Image Reconstruction From fMRI Using Deep Learning: A Survey.
Rakhimberdina Z, Jodelet Q, Liu X, Murata T. Rakhimberdina Z, et al. Front Neurosci. 2021 Dec 20;15:795488. doi: 10.3389/fnins.2021.795488. eCollection 2021. Front Neurosci. 2021. PMID: 34987359 Free PMC article.
Image Semantic Recognition and Segmentation Algorithm of Colorimetric Sensor Array Based on Deep Convolutional Neural Network.
Tang J, Wang L, Huang J, Shi A, Xu L. Tang J, et al. Comput Intell Neurosci. 2022 Sep 30;2022:2439371. doi: 10.1155/2022/2439371. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 36210987 Free PMC article.
Brain-optimized inference improves reconstructions of fMRI brain activity.
Kneeland R, Ojeda J, St-Yves G, Naselaris T. Kneeland R, et al. ArXiv [Preprint]. 2023 Dec 12:arXiv:2312.07705v1. ArXiv. 2023. PMID: 38168454 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Beliy R., Gaziv G., Hoogi A., Strappini F., Golan T., Irani M. Advances in Neural Information Processing Systems. 2019. From voxels to pixels and back: Self-supervision in natural-image reconstruction from fMRI.
2. https://papers.nips.cc/paper/8879-from-voxels-to-pixels-and-back-self-supervision-in-natural-image-reconstruction-from-fmrihttp://www.wisdom.weizmann.ac.il/~vision/ssfmri2im/
1. Bonhoeffer T., Grinvald A. Iso-orientation domains in cat visual cortex are arranged in pinwheel-like patterns. Nature. 1991;353(6343):429–431. doi: 10.1038/353429a0. - DOI - PubMed
2. http://www.nature.com/articles/353429a0
1. Cichy R.M., Heinzle J., Haynes J.D. Imagery and perception share cortical representations of content and location. Cereb. Cortex. 2012;22(2):372–380. doi: 10.1093/cercor/bhr106. - DOI - PubMed
2. http://www.ncbi.nlm.nih.gov/pubmed/21666128.
1. Cox D.D., Savoy R.L. Functional magnetic resonance imaging (fMRI) ”brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage. 2003;19(2 Pt 1):261–270. - PubMed
2. http://www.ncbi.nlm.nih.gov/pubmed/12814577.
1. Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009. ImageNet: A large-scale hierarchical image database; pp. 248–255. - DOI
2. http://ieeexplore.ieee.org/document/5206848/

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity

Affiliations

Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources