Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 7:2023.05.18.541327.
doi: 10.1101/2023.05.18.541327.

A gradual transition toward categorical representations along the visual hierarchy during working memory, but not perception

Affiliations

A gradual transition toward categorical representations along the visual hierarchy during working memory, but not perception

Chaipat Chunharas et al. bioRxiv. .

Abstract

The ability to stably maintain visual information over brief delays is central to healthy cognitive functioning, as is the ability to differentiate such internal representations from external inputs. One possible way to achieve both is via multiple concurrent mnemonic representations along the visual hierarchy that differ systematically from the representations of perceptual inputs. To test this possibility, we examine orientation representations along the visual hierarchy during perception and working memory. Human participants directly viewed, or held in mind, oriented grating patterns, and the similarity between fMRI activation patterns for different orientations was calculated throughout retinotopic cortex. During direct viewing of grating stimuli, similarity was relatively evenly distributed amongst all orientations, while during working memory the similarity was higher around oblique orientations. We modeled these differences in representational geometry based on the known distribution of orientation information in the natural world: The "veridical" model uses an efficient coding framework to capture hypothesized representations during visual perception. The "categorical" model assumes that different "psychological distances" between orientations result in orientation categorization relative to cardinal axes. During direct perception, the veridical model explained the data well. During working memory, the categorical model gradually gained explanatory power over the veridical model for increasingly anterior retinotopic regions. Thus, directly viewed images are represented veridically, but once visual information is no longer tethered to the sensory world there is a gradual progression to more categorical mnemonic formats along the visual hierarchy.

Keywords: RSA; categorization; efficient coding; parietal cortex; representational geometry; representational similarity; sensory recruitment; visual cortex; visual perception; visual working memory.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: The authors declare no conflict of interest

Figures

Figure 1:
Figure 1:. Task and main analysis
(A) For the sensory task (left), participants viewed a randomly oriented grating for 9 seconds per trial (contrast phase-reversing at 5 Hz) and reported instances of contrast dimming. For the working memory task (right), participants remembered a briefly presented (500 ms) randomly orientated grating for 13 seconds, until a 3 second recall epoch (not depicted). (B) For each Region of Interest (ROI) we employed a split-half randomization procedure to create a Representational Similarity Matrix (RSM) for each participant. On each randomization fold, voxel patterns from all trials (300–340 for sensory, 324 for memory) were randomly split in half. For each half of trials, we averaged the voxel patterns for every degree in orientation space within a ± 10° window. This resulted in 180 vectors with a length equal to the number of voxels for each split of the data. We then calculated the similarity between each vector (or degree) in one half of the data, to all vectors (or degrees) in the second half of the data, using a Spearman correlation coefficient. This resulted in a 180×180 similarity matrix on each fold. This randomization procedure was repeated 1000 times to generate the final RSM for each ROI and each participant. Across all folds, RSM’s are near-symmetrical around the diagonal, give-or-take some cross-validation noise. (C) Representational geometry of orientation during the sensory (top row) and working memory (bottom row) tasks, for retinotopically defined ROI’s (columns) across all participants. During the sensory task, the clear diagonal pattern in early visual areas V1–V3 indicates that orientations adjacent in orientation space are represented more similarly than orientations further away. During the memory task, similarity clusters strongly around oblique orientations (45° and 135°), contrasting starkly with the similarity patterns during perception. Note that the diagonal represents an inherent noise-ceiling, due to the cross-validation procedure used. This noise ceiling shows inhomogeneities across orientation space, demonstrating how certain orientations may be encoded with more noise than others. RSM’s are scaled to the range of correlations within each subplot to ease visual comparison of representational structure between sensory and memory tasks for all ROI’s (exact ranges are shown in Supplementary Figure 3). For early ROI’s (V1–V4), only visually responsive voxels are included in the analysis. Throughout, 0° (and 180°) denotes vertical, and 90° denotes horizontal.
Figure 2:
Figure 2:. Modeling the representational similarity of perceived and remembered orientations
(A) The distribution of visual orientation in the natural world is inhomogeneous, with higher prevalence of orientations closer to cardinal (90° & 180°) compared to oblique (45° & 135°). The function shown here approximates these input statistics, and is used to constrain both the veridical (in B) and categorical (in C) models. (B) The veridical model is based on the principle of efficient coding – the idea that neural resources are adapted to the statistics of the environment. We model this via 180 idealized orientation tuning functions with amplitudes scaled by the theoretical input statistics function (the top panel shows a subset of tuning functions for illustrational purposes). A vector of neural responses is simulated by computing the activity of all 180 orientation-tuned neurons to a given stimulus orientation. Representational similarity is calculated by correlating simulated neural responses to all possible orientations, resulting in the veridical model RSM (bottom panel). Note that while we chose to modulate tuning curve amplitude, there are multiple ways to warp the stimulus space (e.g., by applying non-uniform changes to gain, tuning width, tuning preference, etc.,). (C) In the categorical model, categorization is based on people’s subjective experience of relative similarity between orientations in different parts of orientation space: If orientations in part of the space appear quite similar, they are lumped together into the same category, while the most distinctive looking orientations serve as category boundaries. This is quantified via the “psychological distance” – the sum of derivatives along the input statistics function between any pair of orientations (see top panel). The insert shows an example of orientation-pairs near cardinal (in blue) and oblique (in red) that have the same physical distance, but different psychological distances. The psychological distance between each possible pair of orientations yields the categorical model’s RSM (bottom panel). (D) Fits of the veridical (grey) and categorical (teal) models for the sensory (top) and memory (bottom) tasks. During the sensory task, the veridical model better explains the data compared to the categorical model in almost all visual ROI’s (except IPS1–3), indicating a representational scheme that is largely in line with modeled early sensory responses. During the memory task, the categorical model gains increasingly more explanatory power over the veridical model along the visual hierarchy, and explains the data significantly better in V3, V3AB, V4, and IPS0. The Fisher transformed semi-partial correlations (on the y-axis) represent the unique contribution of each model after removing the variance explained by the other model via semi-partial correlations. Dots represent individual participants, and errorbars represent ± 1 within-participant SEM. Asterisks indicate the significance level of post-hoc two-sided paired-sample t-tests (*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001) comparing the two models in each ROI.
Figure 3:
Figure 3:. Generating and fitting the veridical and categorical models based on independent behavioral data
(A) During an independent psychophysical examination, a new set of participants (N=17) reported the orientation of briefly presented (200ms) and remembered (2s) single gratings by rotating a response dial with a computer mouse (i.e., via method-of-adjustment). For each possible stimulus orientation in the experiment (±1°), we calculated the mean absolute response error across all participants, and smooth the resulting function (Gaussian, over 10°). The absolute error–1 (y-axis) is plotted against the stimulus orientation shown to participants. From this psychophysical input function, the veridical and categorical models were generated as previously described (see Figure 2B & 2C). (B) Veridical and categorical models generated from the psychophysical input function (in A). (C) Fits of the veridical (in grey) and categorical (in teal) models based on the independent psychophysical data. During the sensory task (top), the veridical model better explains the data compared to the categorical model in all visual ROI’s except for IPS1–3. During the memory task (bottom), the categorical model better explains the data compared to the veridical model in V2, V2AB, and V4 (and marginally better in IPS0 with p = 0.053). The Fisher transformed semi-partial correlations (on the y-axis) represent the unique contribution of each model after removing the variance explained by the other model. Dots represent individual participants, and errorbars represent ± 1 within-participant SEM. Asterisks indicate the significance level of post-hoc two-sided paired-sample t-tests (*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001) comparing the two models in each ROI.
Figure 4:
Figure 4:. Ability to cross-decode using RSA
(A) Using across-task representational similarity analysis, we directly compare orientation response patterns recorded during the sensory task (y-axis), to those measured during the memory task (x-axis). Here we show V1 (left subplot) and IPS0 (right subplot) as example ROI’s. The across-task RSM in V1 shows a clear diagonal component, indicating similar response patterns for specific orientations in the sensory and memory tasks. In IPS0 such pattern similarity for matching orientations in the sensory and memory tasks is less evident. (B) We want to quantify the extent to which orientations held in working memory evoke response patterns that overlap with response patterns from those same orientations when viewed directly, and how this similarity drops at larger distances in orientation space. First, we center our across-task RSM’s on the remembered orientation (notice the x-axis), and then take the sum of correlations relative to the remembered orientation (plotted on top of the across-task RSM’s in grey). We call this the “correlation profile” of the remembered orientation. In V1 we see that correlations are highest between response patterns from matching perceived and remembered orientations (0° on the x-axis), explaining the ability to cross-decode between sensory and memory tasks as demonstrated in previous work (e.g.,,). By contrast, IPS0 shows a much flatter correlation profile. (C) Correlation profiles for all retinotopic ROI’s in our study, obtained by performing across-task RSA (left panel). Most ROI’s show a peaked correlation profile, indicative of shared pattern similarity between the same orientations when directly viewed and when remembered. The different offsets along the y-axis for different ROI’s reflect the overall differences in pattern similarity in different areas of the brain, with pattern similarity being highest in area V1. Shaded areas indicate ° 1 SEM (D) To validate the ability to cross-decode using RSA, we directly compare this new approach (x-axis) to the multivariate analysis performed by Rademaker et al. in 2019 (y-axis). The latter used an inverted encoding model (IEM) that was trained on the sensory task, and tested on the delay period of the memory task. Both the correlation profiles from RSA, and the channel response functions from IEM yield more-or-less peaked functions over orientation space (relative to the remembered orientation) that can be quantified using the same fidelity metric (i.e., convolving with a cosine). Here, we show a high degree of consistency between the fidelity metrics derived with both approaches, and successful cross-generalization from the sensory to the memory task (as indexed by >0 fidelities) in many ROI’s. Each color represents a different ROI, and for each ROI we plot each of the six participants as an individual dot.
Figure 5:
Figure 5:. Second level RSA
(A) To compare how orientation is represented across different regions of visual cortex, RSM’s from fine-grained individual ROI’s (Supplementary Figure 1) were correlated in a 2nd level similarity analysis. For the sensory task (top panel), representational similarity is high among early visual areas; high among the various IPS regions; and high among LO regions. Similarity between these three clusters is relatively low. For the memory task (bottom panel) there is a slight shift in similarity compared to the sensory task, with V1 becoming less similar, and IPS0 becoming more similar, to areas V2–V4. Furthermore, the distinction between areas is generally less pronounced. (B) Representational similarity can also be used as an indicator of connectivity between ROI’s based on shared representational geometry: When the geometry is similar, the “connection” is stronger (indicated here by the width of the grey lines connecting different ROI’s). The sum of the strength of these connections in a given ROI (i.e., degree centrality) indicates to which extent a local representational geometry resembles that of other ROI’s. Degree centrality is highest in early visual cortex and lowest in IPS regions, indicating a higher conservation of geometry across early visual cortical regions.

References

    1. Harrison S. A. & Tong F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458, 632–635 - PMC - PubMed
    1. Serences J. T., Ester E. F., Vogel E. K. & Awh E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psych. Sci., 20, 207–214 - PMC - PubMed
    1. Christophel T.B., Hebart M.N. & Haynes J.D. (2012). Decoding the contents of visual short-term memory from human visual and parietal cortex. J. Neurosci., 32, 12983–12989 - PMC - PubMed
    1. Riggall A.C. & Postle B.R. (2012). The relationship between working memory storage and elevated activity as measured with functional magnetic resonance imaging. J. Neurosci., 32, 12990–12998 - PMC - PubMed
    1. Ester E.F., Sprague T.C., & Serences J.T. (2015). Parietal and Frontal Cortex Encode Stimulus-Specific Mnemonic Representations during Visual Working Memory, Neuron, 87(4), 893–905 - PMC - PubMed

Publication types