Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 6:18:1466364.
doi: 10.3389/fncom.2024.1466364. eCollection 2024.

Simulated synapse loss induces depression-like behaviors in deep reinforcement learning

Affiliations

Simulated synapse loss induces depression-like behaviors in deep reinforcement learning

Eric Chalmers et al. Front Comput Neurosci. .

Abstract

Deep Reinforcement Learning is a branch of artificial intelligence that uses artificial neural networks to model reward-based learning as it occurs in biological agents. Here we modify a Deep Reinforcement Learning approach by imposing a suppressive effect on the connections between neurons in the artificial network-simulating the effect of dendritic spine loss as observed in major depressive disorder (MDD). Surprisingly, this simulated spine loss is sufficient to induce a variety of MDD-like behaviors in the artificially intelligent agent, including anhedonia, increased temporal discounting, avoidance, and an altered exploration/exploitation balance. Furthermore, simulating alternative and longstanding reward-processing-centric conceptions of MDD (dysfunction of the dopamine system, altered reward discounting, context-dependent learning rates, increased exploration) does not produce the same range of MDD-like behaviors. These results support a conceptual model of MDD as a reduction of brain connectivity (and thus information-processing capacity) rather than an imbalance in monoamines-though the computational model suggests a possible explanation for the dysfunction of dopamine systems in MDD. Reversing the spine-loss effect in our computational MDD model can lead to rescue of rewarding behavior under some conditions. This supports the search for treatments that increase plasticity and synaptogenesis, and the model suggests some implications for their effective administration.

Keywords: major depressive disorder; monoamine hypothesis; neuroplasticity; psychedelics; reinforcement learning; reward prediction error.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Illustration of the kind of deep reinforcement learning models used in this work. A weight decay factor applied to connections between artificial neurons is used to simulate the effect of dendritic spine loss seen in depression.
Figure 2
Figure 2
Comparing behaviors of simulated healthy and simulated spine loss agents. (a) The simulated “world.” The agent (red triangle) must learn to navigate the room in search of the green goal. Blue boxes are optional bonus rewards that can be collected en route to the goal, and red boxes are hazards that bring a negative reward (punishment). (b) The simulated spine loss agent still reaches the goal in each episode but collects fewer bonus rewards. (c) The simulated spine loss agent learns a less-rewarding strategy than the healthy agent but learns it quicker, as seen by the time to arrive at asymptotic slope. (d) Contrived situation in which the agent may bypass the optional reward or take an extra step to collect it en route to the goal (reward optimal). (e) Agents’ perceived values for the contrived situation in (d)—the healthy agent’s perceived value of detouring through the optional reward is very high. The spine loss agent has much weaker preferences and a slight preference for bypassing the optional reward (an anhedonia-like effect). Errorbars and shaded regions show the 95% confidence interval of the mean over 20 repetitions.
Figure 3
Figure 3
(a,b) Agents’ effective discounting rates can be inferred by placing them progressively closer to the goal and measuring their perceived values. The spine loss agent operates with a lower effective discount factor (more discounting). (c,d) Contrived situation in which the agent is moved toward a hazard. The healthy agent’s perceived value of moving forward increases through positions 1–3 (moving forward from these positions brings the goal closer). Only in position 4 does the healthy agent’s perceived value of moving forward drop. For the spine loss agent this drop is generalized inappropriately to positions 2 and 3. Errorbars and shaded regions show the 95% confidence interval of the mean over 20 repetitions.
Figure 4
Figure 4
Applying simulated spine loss to a healthy agent causes it to revert to the simpler, low-reward depression-like behavior. Relieving the spine loss (restoring the spines) allows a return to the original behavior—after a short readjustment period. This may support the idea that spine density modulates depressed cognition and behavior. Shaded regions show the 95% confidence interval of the mean over 20 repetitions.
Figure 5
Figure 5
(a) Weight changes induced in the networks per unit loss. This measures simulated response to the reward prediction error signal (i.e., delivered by dopamine). The spine loss network experiences dramatic alterations early in learning, allowing it to converge on a basic strategy quickly. The healthy network exhibits greater and sustained plasticity throughout learning. This effect may hint at an explanation for dopamine system dysfunction in depression. (b) Most neurons in the network with simulated spine loss have activations highly correlated with proximity to the goal, indicating that almost the entire network has been used to store information related to the basic goal-seeking strategy. Neurons in the healthy agent’s network may store a greater variety of information. Shaded regions and boxplots show the 95% confidence interval of the mean over 20 repetitions.
Figure 6
Figure 6
The “depressed” (simulated spine loss) agent is impaired relative to the “healthy” agent in the full-complexity task used throughout this paper. But in a simplified version of the task the impairment vanishes. Thus it seems that spine loss has a greater effect on complex behaviors that require higher orders of processing. Errorbars show the 95% confidence interval of the mean over 20 repetitions.
Figure 7
Figure 7
The relationship between weight decay and performance in our simulations. A small amount of weight decay (representing the normal, ongoing turnover of spines and synapses) is beneficial. More extreme weight decay causes impairment. Shaded region shows the 95% confidence interval of the mean over 20 repetitions.
Figure 8
Figure 8
Comparing behaviors across alternative models of depression. (b) The simulated spine loss agent is the only one that exhibits the strong anhedonia-like loss of preference in the contrived situation in (a), although faster learning from negative experiences and increased discounting both cause a reduction in overall perceived values. (c) The spine loss agent shows the most dramatic reduction in optional-reward-seeking behavior. The high-exploration agent shows a smaller drop in optional rewards due to the increased randomness in its actions. (d) The agent with a high discounting parameter setting obviously exhibits higher discounting than other models. But the spine loss agent also shows increased effective discounting—despite having the same discount factor parameter setting as the healthy agent. (f) The simulated spine loss agent is the only one exhibiting generalized-fear-like effects in the contrived situation in (e). (g) Kullback–Leibler divergence between probability distributions over actions, assuming a softmax action selection policy. All bars show the divergence from the probability distribution of healthy agents (“healthy” shows divergence between different healthy agents). Errorbars and shaded regions show the 95% confidence interval of the mean over 20 repetitions.

References

    1. Aleksandrova L. R., Phillips A. G. (2021). Neuroplasticity as a convergent mechanism of ketamine and classical psychedelics. Trends Pharmacol. Sci. 42, 929–942. doi: 10.1016/j.tips.2021.08.003, PMID: - DOI - PubMed
    1. American Psychiatric Association (2022). Diagnostic and statistical manual of mental disorders. DSM-5-TR. Edn. Washington, DC: American Psychiatric Association Publishing.
    1. Amlung M., Marsden E., Holshausen K., Morris V., Patel H., Vedelago L., et al. . (2019). Delay discounting as a Transdiagnostic process in psychiatric disorders: a meta-analysis. JAMA Psychiatry 76, 1176–1186. doi: 10.1001/jamapsychiatry.2019.2102, PMID: - DOI - PMC - PubMed
    1. Andriushchenko M., D’Angelo F., Varre A., Flammarion N., (2023). Why do we need weight decay in modern deep learning? doi: 10.48550/arXiv.2310.04415 - DOI
    1. Bakic J., Pourtois G., Jepma M., Duprat R., De Raedt R., Baeken C. (2017). Spared internal but impaired external reward prediction error signals in major depressive disorder during reinforcement learning. Depress. Anxiety 34, 89–96. doi: 10.1002/da.22576, PMID: - DOI - PubMed

LinkOut - more resources