. 2024 May 2:12:RP90069.

doi: 10.7554/eLife.90069.

Representational drift as a result of implicit regularization

Aviv Ratzon^{1

2}, Dori Derdikman¹, Omri Barak^{1

2}

Affiliations

¹ Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel.
² Network Biology Research Laboratory, Technion - Israel Institute of Technology, Haifa, Israel.

PMID: 38695551
PMCID: PMC11065423
DOI: 10.7554/eLife.90069

Representational drift as a result of implicit regularization

Aviv Ratzon et al. Elife. 2024.

. 2024 May 2:12:RP90069.

doi: 10.7554/eLife.90069.

Authors

Aviv Ratzon^{1

2}, Dori Derdikman¹, Omri Barak^{1

2}

Affiliations

¹ Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel.
² Network Biology Research Laboratory, Technion - Israel Institute of Technology, Haifa, Israel.

PMID: 38695551
PMCID: PMC11065423
DOI: 10.7554/eLife.90069

Abstract

Recent studies show that, even in constant environments, the tuning of single neurons changes over time in a variety of brain regions. This representational drift has been suggested to be a consequence of continuous learning under noise, but its properties are still not fully understood. To investigate the underlying mechanism, we trained an artificial network on a simplified navigational task. The network quickly reached a state of high performance, and many units exhibited spatial tuning. We then continued training the network and noticed that the activity became sparser with time. Initial learning was orders of magnitude faster than ensuing sparsification. This sparsification is consistent with recent results in machine learning, in which networks slowly move within their solution space until they reach a flat area of the loss function. We analyzed four datasets from different labs, all demonstrating that CA1 neurons become sparser and more spatially informative with exposure to the same environment. We conclude that learning is divided into three overlapping phases: (i) Fast familiarity with the environment; (ii) slow implicit regularization; and (iii) a steady state of null drift. The variability in drift dynamics opens the possibility of inferring learning algorithms from observations of drift statistics.

Keywords: CA1; artificial neural network; mouse; neuroscience; noise; regularization; representational drift; theoretical neuroscience.

PubMed Disclaimer

Conflict of interest statement

AR, DD, OB No competing interests declared

Figures

**Figure 1.. Two types of possible movements within the solution space.**
(A) Two options of how drift may look in the solution space. Random walk within the space of equally good solutions that is either undirected (left) or directed (right). (B) The qualitative consequence of the two movement types. For an undirected random walk, all properties of the solution will remain roughly constant (left). For the directed movement there should be a given property that is gradually increasing or decreasing (right).

**Figure 2.. Continuous noisy learning leads to drift and spontaneous sparsification.**
(A) Illustration of an agent in a corridor receiving high-dimensional visual input from the walls. (B) Loss as a function of training steps (log scale). Zero loss corresponds to a mean estimator. Note the rapid drop in loss at the beginning, after which it remains roughly constant. (C) Mean spatial information (SI, blue) and fraction of units with non-zero activation for at least one input (red) as a function of training steps. (D) Rate maps sampled at four different time points (columns). Maps in each row are sorted according to a different time point. Sorting is done based on the peak tuning value to the latent variable. (E) Correlation of rate maps between different time points along training. Only active units are used.

**Figure 3.. Experimental data consistent with simulations.**
Data from four different labs show sparsification of CA1 spatial code, along with an increase in the information of active cells. Values are normalized to the first recording session in each experiment. Error bars show standard error of the mean. (A) Fraction of place cells (slope=-0.0003 p < .001) and mean spatial information (SI) (slope=0.002, p < .001) per animal over 200 min (Khatib et al., 2023). (B) Number of cells per animal (slope=-0.052, p = .004) and mean SI (slope=0.094, p < .001) over all cells pooled together over 10 days. Note that we calculated the number of active cells rather than fraction of place cells because of the nature of the available data (Jercog et al., 2019b). (C) Fraction of place cells (slope=-0.048, p = .011) and mean SI per animal (slope=0.054, p < .001) over 11 days (Karlsson and Frank, 2008). (D) Fraction of place cells (slope=-0.026, p < .001) and mean SI (slope=0.068, p < .001) per animal over 8 days (Sheintuch et al., 2023).

**Figure 4.. Generality of the results.**
Summary of 616 simulations with various parameters, excluding stochastic gradient descent (SGD) with label noise (see Table 2). (A) Fraction of active units normalized by the first timestep for all simulations. Red line is the mean. Note that all simulations exhibit a stochastic decrease in the fraction of active units. See Figure 4—figure supplement 1 for further breakdown. (B) Dependence of sparseness (top) and sparsification time scale (bottom) on noise amplitude. Each point is one of 178 simulations with the same parameters except noise variance. (C) Learning a similarity matching task with Hebbian and anti-Hebbian learning using published code from Qin et al., 2023. Performance of the network (blue) and fraction of active units (red) as a function of training steps. Note that the loss axis does not start at zero, and the dynamic range is small. The background colors indicate which phase is dominant throughout learning (1 - red, 2 - yellow, 3 - green).

**Figure 4—figure supplement 1.. Noisy learning leads to spontaneous sparsification.**
Summary of 516 simulations with three different learning algorithms: Stochastic error descent (SED, Cauwenberghs, 1992), SGD, Adam. All values are normalized to the first time step of each simulation. The red lines indicate mean over all simulations. (A) Fraction active units – number of units with any response. (B) Active fraction – overall activity across all units (see methods).

**Figure 5.. Noisy learning leads to a flat landscape.**
(A) Gradient Descent dynamics over a two-dimensional loss function with a one-dimensional zero-loss manifold (colors from blue to yellow denote loss). Note that the loss is identically zero along the horizontal axis, but the left area is flatter. The orange trajectory begins at the red dot. Note the asymmetric extension into the left area. (B) Fraction of active units is highly correlated with the number of non-zero eigenvalues of the Hessian. (C) Update noise reduces small eigenvalues. Log of non-zero eigenvalues at two consecutive time points for learning with update noise. Note that eigenvalues do not correspond to one another when calculated at two different time points, and this plot demonstrates the change in their distribution rather than changes in eigenvalues corresponding to specific directions. The distribution of larger eigenvalues hardly changes, while the distribution of smaller eigenvalues is pushed to smaller values. (D) Label noise reduces the sum over eigenvalues. Same as (C), but for actual values instead of log.

**Figure 5—figure supplement 1.. Label and update noise impose different regularization over the Hessian with distinct signatures in activity statistics.**
Summary of 362 simulations with either label or update noise added to stochastic gradient descent (SGD) learning algorithm. All values are normalized to the first time step of each simulation. Lines indicate the mean of simulations and shaded regions indicate one standard deviation. Loss convergence varies between simulations, and is achieved on a scale of no more than 10⁵ time steps. (A) Active fraction as a function of training time. Note this metric decreases significantly for both types of noise. (B) Fraction of active units as a function of training time. For label noise, the change is much smaller. (C) Sum of the loss Hessian’s eigenvalues as a function of training time. Here the difference is apparent - label noise imposes slow implicit regularization over this metric while update noise does not. (D) Fraction of non-zero eigenvalues in the loss Hessian as a function of training time. As explained in the main text, update noise imposes implicit regularization over the sum of log-eigenvalues, which manifests as a zeroing of eigenvalues over time and thus a reduction in the fraction of active units.

**Figure 6.. Illustration of sparsity metrics.**

**Author response image 1.. PV correlation between training time points averaged over 362 simulations.**
(B) Mean SI of units normalized to first time step, averaged over 362 simulations. Red line shows the average time point of loss convergence, the shaded area represents one standard deviation.

See this image and copyright information in PMC

Update of

Representational drift as a result of implicit regularization.
Ratzon A, Derdikman D, Barak O. Ratzon A, et al. bioRxiv [Preprint]. 2024 Feb 7:2023.05.04.539512. doi: 10.1101/2023.05.04.539512. bioRxiv. 2024. Update in: Elife. 2024 May 02;12:RP90069. doi: 10.7554/eLife.90069. PMID: 38370656 Free PMC article. Updated. Preprint.

Cited by

Aligned and oblique dynamics in recurrent neural networks.
Schuessler F, Mastrogiuseppe F, Ostojic S, Barak O. Schuessler F, et al. Elife. 2024 Nov 27;13:RP93060. doi: 10.7554/eLife.93060. Elife. 2024. PMID: 39601404 Free PMC article.
Representational drift and learning-induced stabilization in the piriform cortex.
Morales GB, Muñoz MA, Tu Y. Morales GB, et al. Proc Natl Acad Sci U S A. 2025 Jul 22;122(29):e2501811122. doi: 10.1073/pnas.2501811122. Epub 2025 Jul 16. Proc Natl Acad Sci U S A. 2025. PMID: 40668830
Representational drift without synaptic plasticity.
Haimerl C, Machens C. Haimerl C, et al. bioRxiv [Preprint]. 2025 Jul 29:2025.07.23.666352. doi: 10.1101/2025.07.23.666352. bioRxiv. 2025. PMID: 40766535 Free PMC article. Preprint.
Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences.
Wang Z, Di Tullio RW, Rooke S, Balasubramanian V. Wang Z, et al. ArXiv [Preprint]. 2025 Jul 9:arXiv:2408.05798v3. ArXiv. 2025. PMID: 39975441 Free PMC article. Preprint.
Representational drift as the consequence of ongoing memory storage.
Devalle F, Zou L, Cecchini G, Roxin A. Devalle F, et al. Sci Rep. 2025 Jul 30;15(1):27746. doi: 10.1038/s41598-025-11102-x. Sci Rep. 2025. PMID: 40739304 Free PMC article.

See all "Cited by" articles

References

1. Aitken K, Garrett M, Olsen S, Mihalas S. The geometry of representational drift in natural and artificial neural networks. PLOS Computational Biology. 2022;18:e1010716. doi: 10.1371/journal.pcbi.1010716. - DOI - PMC - PubMed
1. Aviv-Ratzon Driftreg. swh:1:rev:cb83d928b66401405c26500ab93b4b98ef7b3b67Software Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:b6b2c3944401b7c73209f6d47...
1. Bengio Y, Lee DH, Bornschein J, Mesnard T, Lin Z. Towards Biologically Plausible Deep Learning. arXiv. 2015 https://arxiv.org/abs/1502.04156
1. Blanc G, Gupta N, Valiant G, Valiant P. Implicit regularization for deep neural networks driven by an ornstein-uhlenbeck like process. Conference on learning theory.2020.
1. Brette R. Is coding a relevant metaphor for the brain. Behavioral and Brain Sciences. 2019;42:e215. doi: 10.1017/S0140525X19001997. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 MH125544/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Representational drift as a result of implicit regularization

Affiliations

Representational drift as a result of implicit regularization

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Update of

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous