. 2021 Sep 1;109(17):2755-2766.e6.

doi: 10.1016/j.neuron.2021.06.018. Epub 2021 Jul 14.

When the ventral visual stream is not enough: A deep learning account of medial temporal lobe involvement in perception

Tyler Bonnen¹, Daniel L K Yamins², Anthony D Wagner³

Affiliations

¹ Department of Psychology, Stanford University, Stanford, CA, USA. Electronic address: bonnen@stanford.edu.
² Department of Psychology, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.
³ Department of Psychology, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.

PMID: 34265252
PMCID: PMC10870832
DOI: 10.1016/j.neuron.2021.06.018

When the ventral visual stream is not enough: A deep learning account of medial temporal lobe involvement in perception

Tyler Bonnen et al. Neuron. 2021.

. 2021 Sep 1;109(17):2755-2766.e6.

doi: 10.1016/j.neuron.2021.06.018. Epub 2021 Jul 14.

Authors

Tyler Bonnen¹, Daniel L K Yamins², Anthony D Wagner³

Affiliations

¹ Department of Psychology, Stanford University, Stanford, CA, USA. Electronic address: bonnen@stanford.edu.
² Department of Psychology, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.
³ Department of Psychology, Stanford University, Stanford, CA, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA.

PMID: 34265252
PMCID: PMC10870832
DOI: 10.1016/j.neuron.2021.06.018

Abstract

The medial temporal lobe (MTL) supports a constellation of memory-related behaviors. Its involvement in perceptual processing, however, has been subject to enduring debate. This debate centers on perirhinal cortex (PRC), an MTL structure at the apex of the ventral visual stream (VVS). Here we leverage a deep learning framework that approximates visual behaviors supported by the VVS (i.e., lacking PRC). We first apply this approach retroactively, modeling 30 published visual discrimination experiments: excluding non-diagnostic stimulus sets, there is a striking correspondence between VVS-modeled and PRC-lesioned behavior, while each is outperformed by PRC-intact participants. We corroborate and extend these results with a novel experiment, directly comparing PRC-intact human performance to electrophysiological recordings from the macaque VVS: PRC-intact participants outperform a linear readout of high-level visual cortex. By situating lesion, electrophysiological, and behavioral results within a shared computational framework, this work resolves decades of seemingly inconsistent findings surrounding PRC involvement in perception.

Keywords: biologically plausible computational models; convolutional neural network; electrophysiological; lesion; medial temporal lobe; memory; perception; perirhinal cortex; ventral visual stream.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1.. Resolving seemingly inconsistent experimental findings by situating human behavior in relationship to VVS-supported performance**
(A) Perirhinal cortex (PRC) is a neuroanatomical structure within the medial temporal lobe (MTL) situated at the apex of the ventral visual system (VVS), down-stream of “high-level” visual structures such as inferior temporal (IT) cortex. (B) A perceptual-mnemonic hypothesis posits that PRC enables perceptual behaviors not supported by canonical sensory cortices (e.g., IT), in addition to its mnemonic functions. Critically, PRC-related perceptual impairments are only expected on so-called “complex” perceptual stimuli. (C) Our trial-level protocol formalizes perceptual demands in “oddity” visual discrimination tasks by simulating visual discrimination behaviors in the *absence* of PRC. We segment each stimulus screen containing n objects into n independent images, pass them to a computational proxy for the VVS, and extract n feature vectors from an “IT-like” layer. After generating an item-by-item covariance matrix for each trial, the item with the least off-diagonal covariance is marked as the oddity. Critically, this is a lossless decision-making protocol that is agnostic to extra-perceptual task demands (i.e., memory, attention, motivation). (D) Example stimuli used to evaluate the perceptual-mnemonic hypothesis that span the range of stimulus complexity, taken from trials used in Barense et al. (2007). (E) Schematic of seemingly inconsistent experimental findings for (left) and against (right) PRC involvement in perception. These experiments are classified as either complex or control trials—a binary distinction—as is often the case in the literature. Deciding whether a stimulus set is complex depends on experimenter discretion, not objective measures. (F) We propose that these apparent inconsistencies can be resolved by situating human behavior in relationship to a linear readout of the VVS: PRC-lesioned behavior is predicted by a linear readout of high-level VVS (purple), while PRC-intact behavior outperforms a linear readout of the VVS (gray). We consider experiments described as complex but for which the model performs at ceiling (i.e., x = 1) to be non-diagnostic. In this formulation of the perceptual-mnemonic hypothesis, stimulus complexity is continuous and inversely related to VVS-supported performance.

**Figure 2.. A computational proxy of the VVS predicts PRC-lesioned performance, while both are outperformed by PRC-intact participants**
We collect previously published oddity experiments administered to PRC-lesioned and PRC-intact human participants. We present experimental stimuli to a computational proxy for the VVS, then use a linear decoder off an IT-like layer to predict the oddity in each trial. We then average performance across all trials in each experiment. This single value (“model performance”) corresponds to the experimental accuracy expected from a linear readout of IT cortex under a lossless decision-making protocol. Experiments in which model performance is at ceiling (x = 1, open dots) are not relevant for evaluating the role of PRC in perception: as VVS responses should support perfect discrimination between stimuli, any below-ceiling performance in the human is attributed to extra-perceptual task demands (e.g., memory). (A) This computational proxy for the VVS predicts the behavior of PRC-lesioned participants, while PRC-intact participants outperform both model and PRC-lesioned participants. (B) HPC-lesioned and intact participants all outperform this computational model on relevant stimuli — both for participants with an entirely intact MTL, which includes PRC, as well as for participants with selective damage to the HPC that spares PRC. Together, these results suggest that PRC-lesioned behavior reflects a linear readout of the VVS, neurotypical behaviors on these tasks outperform the VVS, and this behavior is dependent on PRC.

**Figure 3.. PRC-lesioned oddity performance appears to rely on high-level visual cortex**
The perceptual-mnemonic hypothesis claims that PRC-lesioned behavior relies on the VVS (Murray and Bussey, 1999), and some evidence suggests that PRC-lesioned subjects rely on IT cortex (Buffalo et al., 1998a). To evaluate this possibility, we leverage our computational proxy for the VVS to estimate the relative contributions that V4- and IT-supported behaviors have on the retrospective dataset. (A) Using the model’s cross-validated fit to electrophysiological recordings, we determine each layer’s fit to IT (solid) and V4 (dashed) cortex. This enables us to identify the most IT-like and V4-like layer within the model (vertical lines, labeled). We also determine the degree to which each layer better predicts IT, relative to V4, by estimating the differential fit to IT cortex, computing the difference between IT and V4 neural fits (Δ*_IT–V4*: hollow). (B) To evaluate whether IT-like layers uniquely explain PRC-lesioned behaviors, we determine model performance on the retrospective dataset across all layers and evaluate each layer’s fit to human performance. Model performance from all layers predicts PRC-lesioned performance, including performance from V4- and IT-like layers. Additionally (as in previous analysis; Figure 2), we observe a significant interaction between PRC-lesioned and PRC-intact subjects when predicted by most model layers (top). There is not an interaction between HPC-lesioned and HPC-intact subjects for any model layers (bottom). These results suggest that both V4-like (second column) and IT-like (third column) model layers can serve as a computational proxy for PRC-lesioned performance. (C) Nonetheless, we evaluate whether IT-like layers increase their fit to PRC-lesioned behaviors: We compare each layer’s differential fit to IT cortex (Δ_IT–V4, x axis) to that layer’s relative fit to lesion behavior (Δ*_lesion*, y axis). For each layer, we determine the model’s relative fit to lesioned behavior (Δ*_lesion*), using the mean squared prediction error (MSPE). We first compute the MSPE between the model and each group, then determine the difference between lesioned and intact participants for both PRC and HPC groups (Δ*_prc* and Δ*_hpc*, respectively). While all model layers predict PRC-lesioned subject performance (as evident in B), model layers that better fit IT cortex exhibit quantitatively better fits to PRC-relevant behavior (Δ*_prc*, top). There is no relation with HPC-lesioned behavior (Δ*_hpc*, bottom). These results suggest that PRC-lesioned performance reflects a linear readout of high-level visual cortex but are inconclusive, as there is not a clear separation between model performance from IT-like and V4-like layers. These data highlight a limitation of the stimuli used to evaluate PRC-lesioned behavior in the retrospective dataset.

**Figure 4.. Evaluating the perceptual-mnemonic hypothesis with electrophysiological recordings from the VVS**
To address limitations in the retrospective analysis, we design a novel experiment that enables item-level performance estimates, continuously samples the space of stimulus complexity, and clearly disentangles multiple stages of processing (i.e., IT versus V4) across the VVS. These experiments minimize off-hypothesis experimental variance, using the minimum configuration of objects in each trial (N = 3). Critically, this novel experiment enables us to compare PRC-intact human behavior directly to the performance supported by electrophysiological recordings from high-level visual cortex. (A) Example trials from the four categories in this novel oddity experiment. We generate stimuli using a computational proxy for the VVS, selecting trials that uniformly sample the space of VVS-supported performance (i.e., chance to ceiling model performance). (B) Our protocol enables us to evaluate PRC-intact human behavior alongside electrophysiological recordings collected from the macaque. To estimate V4- and IT-supported behavior (top), we use a modified leave-one-out cross-validated approach, averaged across multiple iterations. We use the same protocol to estimate model performance (middle). Finally, we collect human accuracy and reaction time data in a pool of online subjects (N = 297; bottom). We report the item-level estimates (i.e., averaging across all oddities, for a given item) across all measures. (C) We use model performance to situate the electrophysiological and behavioral data within a common framework/axis: evidence consistent with the perceptual-mnemonic hypothesis (left) predicts PRC-intact subjects will outperform high-level visual cortex, while a strictly mnemonic interpretation of PRC function (right) predicts no divergence between PRC-intact and PRC-lesioned behavior.

**Figure 5.. PRC-intact participants outperform a direct readout of high-level visual cortex**
We directly compare PRC-intact human behaviors to electrophysiological recordings from the primate VVS using a novel stimulus set. This enables us to determine whether PRC-intact behaviors are able to outperform a direct readout of high-level visual cortex, while also addressing limitations within the retrospective dataset. In all plots, error bars indicate the standard deviation from the mean. (A) Unlike stimulus sets used within the retrospective analysis (Figure 3B), here we can clearly separate IT- from V4-supported performance: a weighted, linear readout of IT cortex (i.e., IT-supported performance) outperforms a weighted, linear readout of V4 (gray points below the diagonal). (B) Our computational proxy for IT cortex is able to predict IT-supported performance, further validating the model approach used in previous analyses. (C) Critically, PRC-intact human participants outperform IT-supported behaviors. This is the first direct comparison of PRC-intact performance on oddity tasks in relation to electrophysiological recordings from high-level visual cortex, confirming a central tenet of the perceptual-mnemonic hypothesis. (D) Interestingly, we find that the difference between PRC-intact and IT-supported performance scales linearly with reaction time; on each item, subjects require more time in order to outperform a linear readout of IT. These data confirm predictions central to the perceptual-mnemonic hypothesis with unprecedented resolution, validate the computational results on the retrospective dataset, and characterize the temporal dynamics of putatively PRC-dependent visual behaviors.

See this image and copyright information in PMC

Comment in

Perception and memory in the medial temporal lobe: Deep learning offers a new lens on an old debate.
Barense MD, Lee ACH. Barense MD, et al. Neuron. 2021 Sep 1;109(17):2643-2645. doi: 10.1016/j.neuron.2021.08.018. Neuron. 2021. PMID: 34473951

References

1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, and Kudlur M (2016). Tensorflow: A system for large-scale machine learning. 12th {USENIX} symposium on operating systems design and implementation 16 (OSDI; ), pp. 265–283.
1. Aggleton JP, and Brown MW (2006). Interleaving brain systems for episodic and recognition memory. Trends Cogn. Sci 10, 455–463. - PubMed
1. Barense MD, Gaffan D, and Graham KS (2007). The human medial temporal lobe processes online representations of complex objects. Neuropsychologia 45, 2963–2974. - PubMed
1. Bashivan P, Kar K, and DiCarlo JJ (2019). Neural population control via deep image synthesis. Science 364, eaav9436. - PubMed
1. Brown TI, Staresina BP, and Wagner AD (2015). Noninvasive functional and anatomical imaging of the human medial temporal lobe. Cold Spring Harb. Perspect. Biol 7, a021840. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

When the ventral visual stream is not enough: A deep learning account of medial temporal lobe involvement in perception

Affiliations

When the ventral visual stream is not enough: A deep learning account of medial temporal lobe involvement in perception

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous