. 2022 Nov 28;18(11):e1010716.

doi: 10.1371/journal.pcbi.1010716. eCollection 2022 Nov.

The geometry of representational drift in natural and artificial neural networks

Kyle Aitken¹, Marina Garrett¹, Shawn Olsen¹, Stefan Mihalas¹

Affiliations

PMID: 36441762
PMCID: PMC9731438
DOI: 10.1371/journal.pcbi.1010716

The geometry of representational drift in natural and artificial neural networks

Kyle Aitken et al. PLoS Comput Biol. 2022.

. 2022 Nov 28;18(11):e1010716.

doi: 10.1371/journal.pcbi.1010716. eCollection 2022 Nov.

Authors

Kyle Aitken¹, Marina Garrett¹, Shawn Olsen¹, Stefan Mihalas¹

Affiliation

¹ MindScope Program, Allen Institute, Seattle, Washington, United States of America.

PMID: 36441762
PMCID: PMC9731438
DOI: 10.1371/journal.pcbi.1010716

Abstract

Neurons in sensory areas encode/represent stimuli. Surprisingly, recent studies have suggested that, even during persistent performance, these representations are not stable and change over the course of days and weeks. We examine stimulus representations from fluorescence recordings across hundreds of neurons in the visual cortex using in vivo two-photon calcium imaging and we corroborate previous studies finding that such representations change as experimental trials are repeated across days. This phenomenon has been termed "representational drift". In this study we geometrically characterize the properties of representational drift in the primary visual cortex of mice in two open datasets from the Allen Institute and propose a potential mechanism behind such drift. We observe representational drift both for passively presented stimuli, as well as for stimuli which are behaviorally relevant. Across experiments, the drift differs from in-session variance and most often occurs along directions that have the most in-class variance, leading to a significant turnover in the neurons used for a given representation. Interestingly, despite this significant change due to drift, linear classifiers trained to distinguish neuronal representations show little to no degradation in performance across days. The features we observe in the neural data are similar to properties of artificial neural networks where representations are updated by continual learning in the presence of dropout, i.e. a random masking of nodes/weights, but not other types of noise. Therefore, we conclude that a potential reason for the representational drift in biological networks is driven by an underlying dropout-like noise while continuously learning and that such a mechanism may be computational advantageous for the brain in the same way it is for artificial neural networks, e.g. preventing overfitting.

Copyright: © 2022 Aitken et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Setup of passive data and visualization of feature space representations.**
**(a)** Summary of passive data experiment. **(b)** Summary of response vector extraction across three imaging sessions. **[c-f]** Visualization of feature space representations. **(c)** Drift of response vectors belonging to two separate stimulus groups between two sessions. For example, stimulus group 1 might corresponding to the response vectors of the 0 to 1 second time-block of Natural Movie One and stimulus group 2 the 1 to 2 second time-block. **(d)** For each stimulus group in each session, we perform PCA to characterize the group’s variation. An important quantity is also the group’s mean response vector. **(e)** Moving to the respective stimulus group’s PC basis, there is a strong correlation between the variance and mean value along a given direction, familiar from Poisson-like distributions. **(f)** The aforementioned feature space characterization is used to quantify drift. We define the drift vector, d, of a given stimulus group as pointing from the mean response vector of an earlier session (e.g. Session 1) to the mean response vector of a later session (e.g. Session 2). Δt is the time difference between the sessions.

**Fig 2. Passive data: Feature space, drift, and drift’s dependence on time.**
**[a-d]** Data from an exemplar mouse. **(a)** Response vectors (dots), mean response vectors (X’s), and first two PC dimensions (lines, scaled by variance explained), for two stimulus groups in a given session. Plotted in stimulus group 1’s PC space. **(b)** Same as previous subplot, but response vectors of stimulus group 1 across two different sessions, plotted in session 1’s PC space. **(c)** Pairwise angle between the response vectors of the 30 1-second time-blocks across the first five movie repeats of a single session. **(d)** Pairwise angle between mean response vectors across the three different sessions, same color scale as (c) (Methods). **[e-h]** Various metrics as a function of the time between earlier and later session, Δt, for all mice. All metrics are computed for each individual stimulus group, then averaged across all 30 groups. Colored curves are linear regression fits and shaded regions represent all fits within 95% confidence intervals of slope and intercept. **(e)** Average angle between mean response vectors. **(f)** Average (L2) magnitude of drift relative to magnitude of earlier session’s mean response vector. **(g)** Average change in variational space dimension, D, from later session to earlier session. **(h)** Average drift magnitude within earlier session’s variational space, ratio relative to full drift magnitude, see Eq (9). In yellow, the same metric if drift were randomly oriented in neural state space (mean ± s.e., across mice).

**Fig 3. Passive data: Drift geometry and classifier persistence.**
**[a-c]** How various drift metrics depend on PC dimension of the *earlier* session’s variational space. Metrics are plotted as a function of PC_i’s ratio of variance explained, v_i, across all stimulus groups. Colored curves are linear regression fits and shaded regions (often too small to see) are all fits within the 95% confidence intervals of slope and intercept. **(a)** Magnitude of drift along PC_i direction relative to full (L2) magnitude of drift. **(b)** Angle of drift with respect to PC_i direction. **(c)** Post-drift variance explained along PC_i direction, black dotted line is equality. Linear regression fit to log(var. exp.). **[d-f]** Various metrics and how they change between sessions. The darker dots/lines always show mean value with error bars of ± s.e. The lighter color dots show data from individual mice. **(d)** The variational space overlap between earlier and later stimulus groups, 0 ≤ Γ ≤ 1. The “–” marker indicates the average value of Γ for randomly oriented variational spaces of the same dimensions. **(e)** Angle between linear support vector classifiers (normal vector) trained on distinct sessions. The purple dotted line is the average angle between different sessions. **(f)** Cross classification accuracy as a function of trained data session (Class.) and tested data session (Data). The “–” marker shows average classification accuracy when SVCs are randomly rotated by same angle that separates respective sessions’ classifiers. **(g)** The relative cross accuracy, see Eq (14), as a function of the angle of a random SVC rotation. The purple dotted line is again the average angle found between drift sessions, also shown in (e).

**Fig 4. Behavioral data: Experimental setup and drift geometry.**
**(a)** Summary of experimental setup. **(b)** Summary of session ordering, trial types, and extraction of response vectors from dF/F values. Bottom plot shows dF/F values over time, with colored columns representing image flashes where different colors are different images. **[c-e]** Various drift metrics of Hit trials and their dependence on PC_i direction of the earlier session’s variational space. Dark colors correspond to drift between familiar sessions, while lighter colors are those between novel sessions. Metrics are plotted as a function of each PC_i’s ratio of variance explained, v_i. Colored curves are again linear regression fits. **(c)** Magnitude of drift along a given PC_i direction, relative to full magnitude of drift. **(d)** Angle of drift with respect to PC_i direction. **(e)** Post-drift variance explained along PC_i direction (dotted line is equality). Linear regression fit to log(var. exp). **[f-h]** Various metrics as a function of session(s). Dark solid dots/lines show mean values with ± s.e. Light colored dots/lines show raw mice data. **(f)** Mean performance metric over engaged trails, d′ (Methods). **(g)** Angle between SVC normal vectors. **(h)** Cross classification accuracy, as a function of trained data session (Class.) and tested data session (Data). The “–” marker again shows average classification accuracy when SVCs are randomly rotated by same angle that separates respective sessions’ classifiers.

**Fig 5. Artificial neural networks: Hyperparameter fits and drift geometry as a function of Δt and variance explained.**
**(a)** Measure of fit to experimental data, Z_total see Eq (19), as a function of noise hyperparameters, p (top labels) or σ (bottom labels). Dots are best fits, for which additional data is plotted here and in supplemental figures (S5 Fig, Methods). **[b-c]** Various metrics as a function of the time between earlier and later session, Δt. Colored curves are linear regression fits. All data is averaged over 10 initializations. **(b)** Average magnitude of drift relative to magnitude of mean response vector. **(c)** Average percent of drift vector that lies in the variational space of initial session. **[d-i]** Various drift metrics and their dependence on PC dimension of the earlier session’s variational space. Metrics are plotted as a function each PC_i’s ratio of variance explained, v_i, of the corresponding stimulus group Colored curves are linear regression fits. Grey curves are behavioral data fits from the novel sessions shown in Fig 4c, 4d and 4e. Middle row is for networks with additive Gaussian noise (σ = 0.1) and bottom row is with node dropout (p = 0.5). All data is averaged over 10 initializations. **(d, g)** Magnitude of drift along PC_i direction, relative to full magnitude of drift. **(e, h)** Angle of drift with respect to PC_i direction. **(f, i)** Post-drift variance explained along PC_i direction (dotted line is equality). Linear regression fit to log(var. exp.).

**Fig 6. Artificial neural networks: Additional properties of drift geometry.**
**[a-d]** Various quantities as a function of relative/absolute training time (in epochs). Means are shown as dark lines, with 95% confidence intervals shaded behind. Raw data is scattered behind. **(a)** Angle between SVC classifiers (normal vectors) as a function of the time difference. The grey dashed line is the average for the novel Hit data shown in Fig 4g. **(b)** Cross classification accuracy as a function of time difference between classifier training time (Class.) and testing time (Data). **(c)** Difference in angle of a stimulus group’s readout as a function of the time difference. Note the different vertical scale from (a). **(d)** Deviation of the angle between a stimulus group’s drift and the respective readout from perpendicular (i.e. 90 degrees). The dashed green line is the average across time. The dotted black line is the angle between two randomly drawn vectors in a feature space of the same dimension. **(e)** Fits of variance explained versus angle of drift with respect to PC direction for regular node dropout (purple), targeted maximum variance node dropout (pink), and targeted minimum variance node dropout (yellow). The inset shows the r-values of the respective fits. **(f)** Difference in response vector angle as a function of Δt. The dashed vertical line indicates the time scale on which the node dropouts are updated (1/epoch).

See this image and copyright information in PMC

References

1. Holtmaat A, Svoboda K. Experience-dependent structural synaptic plasticity in the mammalian brain. Nature Reviews Neuroscience. 2009;10(9):647–658. doi: 10.1038/nrn2699 - DOI - PubMed
1. Mongillo G, Rumpel S, Loewenstein Y. Intrinsic volatility of synaptic connections—a challenge to the synaptic trace theory of memory. Current opinion in neurobiology. 2017;46:7–13. doi: 10.1016/j.conb.2017.06.006 - DOI - PubMed
1. Stettler DD, Yamahachi H, Li W, Denk W, Gilbert CD. Axons and synaptic boutons are highly dynamic in adult visual cortex. Neuron. 2006;49(6):877–887. doi: 10.1016/j.neuron.2006.02.018 - DOI - PubMed
1. Skarda CA, Freeman WJ. Chaos and the new science of the brain. Concepts in neuroscience. 1990;1(2):275–285.
1. Brette R. Is coding a relevant metaphor for the brain? Behavioral and Brain Sciences. 2019;42. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The geometry of representational drift in natural and artificial neural networks

Affiliation

The geometry of representational drift in natural and artificial neural networks

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources