This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 May 18:2023.05.18.541361.

doi: 10.1101/2023.05.18.541361.

A Unifying Principle for the Functional Organization of Visual Cortex

Eshed Margalit¹, Hyodong Lee², Dawn Finzi^{3

4}, James J DiCarlo^{2

5

6}, Kalanit Grill-Spector^{3

7}, Daniel L K Yamins^{3

4

7}

Affiliations

¹ Neurosciences Graduate Program, Stanford University, Stanford, CA 94305.
² Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139.
³ Department of Psychology, Stanford University, Stanford, CA 94305.
⁴ Department of Computer Science, Stanford University, Stanford, CA 94305.
⁵ McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁶ Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁷ Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305.

PMID: 37292946
PMCID: PMC10245753
DOI: 10.1101/2023.05.18.541361

A Unifying Principle for the Functional Organization of Visual Cortex

Eshed Margalit et al. bioRxiv. 2023.

[Preprint]. 2023 May 18:2023.05.18.541361.

doi: 10.1101/2023.05.18.541361.

Authors

Eshed Margalit¹, Hyodong Lee², Dawn Finzi^{3

4}, James J DiCarlo^{2

5

6}, Kalanit Grill-Spector^{3

7}, Daniel L K Yamins^{3

4

7}

Affiliations

¹ Neurosciences Graduate Program, Stanford University, Stanford, CA 94305.
² Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139.
³ Department of Psychology, Stanford University, Stanford, CA 94305.
⁴ Department of Computer Science, Stanford University, Stanford, CA 94305.
⁵ McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁶ Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139.
⁷ Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305.

PMID: 37292946
PMCID: PMC10245753
DOI: 10.1101/2023.05.18.541361

Update in

A unifying framework for functional organization in early and higher ventral visual cortex.
Margalit E, Lee H, Finzi D, DiCarlo JJ, Grill-Spector K, Yamins DLK. Margalit E, et al. Neuron. 2024 Jul 17;112(14):2435-2451.e7. doi: 10.1016/j.neuron.2024.04.018. Epub 2024 May 10. Neuron. 2024. PMID: 38733985 Free PMC article.

Abstract

A key feature of many cortical systems is functional organization: the arrangement of neurons with specific functional properties in characteristic spatial patterns across the cortical surface. However, the principles underlying the emergence and utility of functional organization are poorly understood. Here we develop the Topographic Deep Artificial Neural Network (TDANN), the first unified model to accurately predict the functional organization of multiple cortical areas in the primate visual system. We analyze the key factors responsible for the TDANN's success and find that it strikes a balance between two specific objectives: achieving a task-general sensory representation that is self-supervised, and maximizing the smoothness of responses across the cortical sheet according to a metric that scales relative to cortical surface area. In turn, the representations learned by the TDANN are lower dimensional and more brain-like than those in models that lack a spatial smoothness constraint. Finally, we provide evidence that the TDANN's functional organization balances performance with inter-area connection length, and use the resulting models for a proof-of-principle optimization of cortical prosthetic design. Our results thus offer a unified principle for understanding functional organization and a novel view of the functional role of the visual system in particular.

PubMed Disclaimer

Figures

**Figure 1.. Constructing a unified model of the functional and spatial constraints of ventral visual cortex.**
**(a)** TDANNs are a family of deep artificial neural networks whose units are assigned positions in a two-dimensional simulated cortical sheet in each layer. Position assignments are retinotopic, such that location in the cortical sheet corresponds to position in the visual field. Each individual dot is a single model unit. The degree of overlap between a unit’s spatial receptive field (RF) and the purple square marked on the input image is indicated by the shade of purple; RFs from gray units do not overlap the marked region at all. The TDANN is trained to minimize the sum of a task loss and a spatial loss (SL). $α$ is a free parameter controlling the relative weight of the SL. **(b)** The SL encourages nearby units to develop strong response correlations. Plotted: pairwise similarity of unit responses as a function of pairwise cortical distance in the final layer of a TDANN model; each dot represents one pair of units. **(c)** The TDANN is evaluated on a battery of quantitative benchmarks that measure its correspondence to topographic features throughout the ventral visual stream. Left: orientation preference map in the V1-like TDANN layer (see Figure 2 for details). Right: category selectivity map in the VTC-like layer (see Figure 3 for details).

**Figure 2.. The TDANN reproduces V1-like topography.**
**(a)** Example sine grating stimuli used to assess tuning for orientation, spatial frequency, and color. **(b)** Orientation tuning curves (top) and spatial frequency tuning curves (bottom) for four example units in the V1-like layer. **(c)** Smoothed orientation preference map (OPM) in the V1-like layer of the TDANN. Box corresponds to inset at right, where individual model units are labeled by their preferred orientation. Results for additional model seeds shown in Supplementary Figure S10. **(d)** OPMs for Macaque V1 (data from Nauhaus et al. [43]), TDANN, and an Unoptimized control model. **(e)** Left: Pairwise difference in preferred orientations as a function of pairwise cortical distance, normalized to the chance level expected by random sampling of pairs. Right: Map smoothness for OPMs in macaque V1 (dashed green line, data from Nauhaus et al. [43]) and four candidate models: the TDANN (purple), the Hand-Crafted self-organizing map (SOM, squares), deep neural network SOM (DNN-SOM, plus signs), and Task Only (diamonds) trained without the spatial term of the loss function. Error bar: 95% CI across random model seeds and sampling of cortical neighborhoods. **(f)** Spatial frequency preference, shown for the same region of the TDANN V1-like layer and macaque V1 as in panel (d). **(g)** Change in preferred spatial frequency as a function of cortical distance, normalized to chance, for macaque V1 and each model type. **(h)** Preference for chromatic stimuli for the same region of the TDANN V1-like layer. Dark-colored dots: stronger responses to chromatic than achromatic gratings. Macaque data: reconstruction of cytochrome oxidase staining data from Livingstone and Hubel [38]. **(i)** Fraction of units differing in their chromatic preference as a function of cortical distance, normalized to chance. **(j)** Similarity of models to the distribution of orientation tuning strengths in macaque V1 (data from Ringach et al. [34]) on the x-axis, and similarity to the smoothness of macaque OPMs (data from Nauhaus et al. [43]) on the y-axis. Multiple markers of the same type indicate different random initial seeds for each model. A value of 1.0 (dashed green) indicates perfect correspondence. **(k)** Density of pinwheels detected in TDANNs, Hand-Crafted SOMs, Task Only models, and Unoptimized models. Error bars: CI across random model seeds. Green: putative macaque V1 pinwheel density.

**Figure 3.. The TDANN predicts the functional organization of higher visual cortex.**
**(a)** Representational similarity matrices (RSMs) for the TDANN and human VTC, computed across selectivity maps of the five object categories. Diagonal is blank to indicate trivially perfect correlation. **(b)** Functional similarity between the TDANN, human VTC, and alternative models, measured as the similarity of RSMs. Green: mean of pairwise human-to-human similarity values. **(c)** Selectivity ( $t$ -value), for each category plotted on the simulated cortical sheet of the VTC-like layer in an example TDANN. Black star: unit whose responses to images in each of the five categories are plotted directly below (individual dots: single images, bar height: mean across images). Scale bar: 1cm. **(d)** Difference in pairwise selectivity as a function of pairwise cortical distance for units in each of five candidate model types: the TDANN (purple), deep neural network self-organizing map (DNN-SOM; plus markers), interactive topographic network (“ITN”, Blauch et al. [20]; circles), Unoptimized (“x” markers), and Task Only (diamond markers). Curves are normalized to the chance level obtained by random sampling of unit pairs. Green: Human data averaged over the eight subjects in the NSD data. Shaded regions: 95% confidence interval across different subsets of units from models trained with different random initial seeds. **(e)** Smoothness of selectivity maps for each category and each candidate model. Dashed green: mean of human data. **(f)** Category-selective patches for an example hemisphere in human ventral temporal cortex (VTC), TDANN, a Task Only model (no patches detected), a DNN-SOM, and a reproduction of the “ITN” simulated cortical sheet from [20]. Object categories are indexed by color as in (a) and (c). Examples from different initial random seeds are shown in Supplementary Figure S10. **(g)** Number of category-selective patches (averaged across categories) for the TDANN, DNN-SOM, and ITN. Dashed green: average of human data. ANOVA for difference in patch count: $F (5,179) = 32.7, p < 10^{- 22}$ . Post-hoc Tukey’s tests: significant difference between VTC and ITN ( $(p = 1.2 \times 10^{- 5})$ . **(h)** Average surface area of category-selective patches. Same plotting conventions as in (f). ANOVA for difference in patch area: $F (5,187) = 15.4, p < 10^{- 11}$ . Post-hoc Tukey’s tests: significant difference between VTC and DNN-SOM $(p < 10^{- 10})$ . **(i)** Each human subject and model instance compared to the mean patch area (y-axis) and patch number (x-axis) in the human data. **(j)** Overlap between face-selectivity and body-selectivity vs. overlap between face-selectivity and place-selectivity, for each human hemisphere (green dots), each TDANN instance (purple dots), the ITN (gray dot), each DNN-SOM (gray plus signs), and Task Only models (gray diamonds).

**Figure 4.. Convergence of multiple benchmarks indicates a balancing between functional and spatial constraints.**
**(a)** Topographic maps in the V1-like (top row) and VTC-like layer (bottom row) of TDANN models trained at different levels of the spatial weight $α$ . Top: Orientation map structure and pinwheels become apparent at $α > 0.1$ and persist until $α = 1.25$ . Dots: estimated pinwheel locations; black: clockwise, white: counterclockwise. Bottom: Category selectivity maps, with selective units $(t > 12)$ colored according to their preferred category. **(b)** Functional correspondence to neural data as a function of $α$ . Top: Fraction of units strongly orientation selective (circular variance ≤ 0.6) in the V1-like layer. Dashed green: value measured in macaque V1 (from Ringach et al. [34]). Dashed gray: mean value for Unoptimized models. Shaded regions: 95% CI across multiple initial random seeds. Bottom: Representational similarity between the VTC-like layer and human VTC (as in Figure 3). Error region indicates 95% CI across model seeds and human hemispheres. In both plots, the vertical line at $α = 0.25$ marks the default value used in prior figures. **(c)** Topographic map smoothness as a function of $α$ . Top: OPM smoothness in the V1-like layer. Dashed green: value in macaque V1. Dashed gray: smoothness in an Unoptimized model. Bottom: Category selectivity map smoothness in the VTC-like layer. Dashed lines indicate means across human subjects and hemispheres from the NSD data; one line per category. **(d)** Density of topographic phenomena of interest as a function of $α$ . Top: Pinwheel density in OPMs from the V1-like layer, as a function of $α$ . Bottom: Number of category selective patches for each category in the VTC-like layer, as a function of $α$ . Human data in dashed lines.

**Figure 5.. Self-supervision and scalable spatial constraints underly the emergence of functional organization.**
In each panel, TDANN shown in purple, Categorization-trained in gold, Absolute SL in red, and ventral stream measurements in green. **(a)** Left: comparison of task objectives. The TDANN uses contrastive self-supervision (top) which encourages similarity between representations of different views of the same image while increasing distance between representations of views of other images. Categorization (bottom) compares predicted class probabilities to the human-labeled correct class. Right: comparison of spatial objectives. $S_{i j}$ : response similarity of units $i$ and $j . d_{i j}$ : cortical distance between units $i$ and $j$ . TDANN uses the Relative SL (top), which correlates the population of response similarities and pairwise inverse distances. Prior work [78] used the Absolute SL (bottom), which directly subtracts inverse cortical distance from response similarity magnitude. **(b)** Smoothed orientation preference maps (OPMs) in the V1-like layer of the TDANN (left), a Categorization trained model (middle), and a model trained with the Absolute SL (right). Dots: detected pinwheels. $α = 0.25$ for models shown in each panel. **(c)** Category selective units in the VTC-like layer of the TDANN (left), a categorization trained model (middle) and a model trained with the absolute SL (right). **(d)** Right: Smoothness of OPMS in the V1-like layer of each model type. Green line: value computed macaque V1. **(e)** Density of detected pinwheels. Green: estimated value in macaque V1. **(f)** Right: Smoothness of face selectivity maps in the VTC-like layer of each model type. Green line: value from human VTC. **(g)** Average number of category-selective patches, in the VTC-like layer in each model. Green: average value in human VTC.

**Figure 6.. Spatial constraints make learned representations more brain-like and reduce intrinsic dimensionality**
**(a)** Variance explained under a linear regression mapping between model units and macaque IT neurons, as a function of the spatial loss weight $α$ and the training objective. **(b)** Mean correlation between model units and VTC voxels under a one-to-one mapping as a function of $α$ . Green: mean human-to-human correlation under the same one-to-one mapping. **(c)** Estimated effective dimensionality (cf. Elmoznino and Bonner [83], Del Giudice [84]) of the population response in the VTC-like layer of models trained at different levels of $α$ and with different objectives. Green: mean value in human VTC from the NSD dataset. **(d)** Effective dimensionality in the TDANN across all layers and levels of $α$ . In all panels, shaded vertical bar indicates value of $α$ demonstrated in prior analyses to best match topographic phenomena.

**Figure 7.. Minimization of inter-layer (feedforward) wiring length in models with brain-like functional organization.**
**(a)** Example wiring length computation between adjacent layers. Units in brown are the top 5% most active units in the Source layer for an arbitrarily-selected natural image, while units in green are the top 5% most active in the Target layer. Black dots show the origination and termination points of fibers that would be required to connect populations of active units across layers. **(b)** Wiring length between layers 4 and 5 (“V1”; left), and layer 8 and 9 (“VTC”, right) as a function of $α$ . Shaded regions: 95% CI of measurements from different cortical neighborhoods, model seeds, and input images. **(c)** Accuracy on object categorization vs total wiring length, for models trained at different levels of $α$ . **(d)** Wiring length in both early and later model layers for models trained with different task and spatial objectives ( $α = 0.25$ for all). Error bar: 95% CI over different image presentations and model seeds.

**Figure 8.. Using TDANNs to simulate spatial stimulation devices.**
**(a)** Stimulation of a local population of units in the second to last convolutional layer drives spatially-localized responses in the final convolutional layer. Responses are functionally aligned, such that stimulating face-selective units (Site 1) drives activity in face-selective units in the following layer. Right: Results for a second stimulation site, at the intersection of place-, body-, and character-selective patches. **(b)** Similarity in tuning of stimulated units in the source layer and responding units in the target layer for 100 evenly-spaced stimulation sites. Each dot compares tuning similarity for the true distribution of activated units (x-axis) and a randomly shuffled selection of units (y-axis). Dot color: distance of the stimulation site from the center of the cortical tissue. **(c-d)** Conceptual framework for applying the TDANN to the prototyping of visual cortical prostheses. **(c)** Stimulation Simulator: the TDANN is used to generate predicted activity patterns from a given visual input (top row). Patterns are then degraded according to the limitations on a hypothetical stimulation device: reduced spatial precision results in blurring of the target activity pattern (bottom row), and limits to regional access restrict the set of layers that participate. Here, Layer 8 is faded-out to show that this particular hypothetical device cannot reach that cortical area. **(d)** Given a device-achievable stimulation pattern produced by the Stimulation Simulator in (c), we synthesize the image that could evoke that pattern: the predicted percept. To build intuition for the fidelity of predicted percepts, we use an example input image of the the first four lines of a Snellen eye chart. **(e)** Predicted percepts for 25 theoretical cortical stimulation devices with different capabilities. Devices vary in the precision with which they are able to produce desired activity patterns (full-width at half-maximum (FWHM) of the spread of activity on cortex increases with rows) and the number of cortical areas that can be simultaneously simulated (columns).

See this image and copyright information in PMC

References

1. Hubel D H and Wiesel T N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol., 160:106–154, January 1962. - PMC - PubMed
1. Humphries Colin, Liebenthal Einat, and Binder Jeffrey R. Tonotopic organization of human auditory cortex. Neuroimage, 50(3):1202–1211, April 2010. - PMC - PubMed
1. Harvey B M, Klein B P, Petridou N, and Dumoulin S O. Topographic representation of numerosity in the human parietal cortex. Science, 341(6150):1123–1126, September 2013. - PubMed
1. Wong Y C, Kwan H C, MacKay W A, and Murphy J T. Spatial organization of precentral cortex in awake primates. I. Somatosensory inputs. J. Neurophysiol., 41(5):1107–1119, September 1978. - PubMed
1. Obenhaus Horst A, Zong Weijian, Jacobsen R Irene, Rose Tobias, Donato Flavio, Chen Liangyi, Cheng Heping, Bonhoeffer Tobias, Moser May-Britt, and Moser Edvard I. Functional network topography of the medial entorhinal cortex. Proc. Natl. Acad. Sci. U. S. A., 119(7), February 2022. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

A Unifying Principle for the Functional Organization of Visual Cortex

Affiliations

A Unifying Principle for the Functional Organization of Visual Cortex

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources