. 2021 Aug;30(4):e13236.

doi: 10.1111/jsr.13236. Epub 2020 Nov 20.

Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction

Andreas Gerhardsson^{1

2}, Danja K Porada³, Johan N Lundström^{3

4

5

6}, John Axelsson^{1

2

3}, Johanna Schwarz^{2

3}

Affiliations

¹ Department of Psychology, Stockholm University, Stockholm, Sweden.
² Department of Psychology, Stress Research Institute, Stockholm University, Stockholm, Sweden.
³ Department of Clinical Neuroscience, Karolinska Institute, Stockholm, Sweden.
⁴ Monell Chemical Senses Center, Philadelphia, PA, USA.
⁵ Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.
⁶ Stockholm University Brain Imaging Centre, Stockholm University, Stockholm, Sweden.

PMID: 33219629
PMCID: PMC8365707
DOI: 10.1111/jsr.13236

Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction

Andreas Gerhardsson et al. J Sleep Res. 2021 Aug.

. 2021 Aug;30(4):e13236.

doi: 10.1111/jsr.13236. Epub 2020 Nov 20.

Authors

Andreas Gerhardsson^{1

2}, Danja K Porada³, Johan N Lundström^{3

4

5

6}, John Axelsson^{1

2

3}, Johanna Schwarz^{2

3}

Affiliations

¹ Department of Psychology, Stockholm University, Stockholm, Sweden.
² Department of Psychology, Stress Research Institute, Stockholm University, Stockholm, Sweden.
³ Department of Clinical Neuroscience, Karolinska Institute, Stockholm, Sweden.
⁴ Monell Chemical Senses Center, Philadelphia, PA, USA.
⁵ Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.
⁶ Stockholm University Brain Imaging Centre, Stockholm University, Stockholm, Sweden.

PMID: 33219629
PMCID: PMC8365707
DOI: 10.1111/jsr.13236

Abstract

To learn from feedback (trial and error) is essential for all species. Insufficient sleep has been found to reduce the sensitivity to feedback as well as increase reward sensitivity. To determine whether insufficient sleep alters learning from positive and negative feedback, healthy participants (n = 32, mean age 29.0 years, 18 women) were tested once after normal sleep (8 hr time in bed for 2 nights) and once after 2 nights of sleep restriction (4 hr/night) on a probabilistic selection task where learning behaviour was evaluated in three ways: as generalised learning, short-term win-stay/lose-shift learning strategies, and trial-by-trial learning rate. Sleep restriction did not alter the sensitivity to either positive or negative feedback on generalised learning. Also, short-term win-stay/lose-shift strategies were not affected by sleep restriction. Similarly, results from computational models that assess the trial-by-trial update of stimuli value demonstrated no difference between sleep conditions after the first block. However, a slower learning rate from negative feedback when evaluating all learning blocks was found after sleep restriction. Despite a marked increase in sleepiness and slowed learning rate for negative feedback, sleep restriction did not appear to alter strategies and generalisation of learning from positive or negative feedback.

Keywords: carrot or stick; feedback-based learning; lack of sleep; reward or punishment; sleep deprivation; valanced feedback.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**FIGURE 1**
(a) Symbol pairs in the training and the two learning sets together with winning probability within each pair. Learning criteria for Set 1 and Set 2 were ≥65% A choices for A/B, ≥60% C choices for C/D, and ≥40% E choices for E/F after each block. (b) Trial example from learning phase. In the test phase (not depicted) no feedback was given and symbol pairs were scrambled

**FIGURE 2**
Boxplots show observed sleepiness ratings according to the Karolinska Sleepiness Scale (KSS; Åkerstedt & Gillberg, 1990), and subjective stress ratings (Schwarz et al., 2018) for the normal and restricted sleep conditions. Histograms to the right of each boxplot show the posterior distributions of the difference between sleep conditions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line), and the regions of practical equivalence (ROPE; red shading) around zero (dotted line). Sleepiness increased strongly but an increase in stress after sleep restriction was not large enough to be conclusively separated from the ROPE. Bars above the histograms show Bayes factors with level of support for either hypothesis (BF₁₀, red; BF₀₁, grey) indicated by length of the bar; black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme evidence (BF >100) (Beard et al., 2016)

**FIGURE 3**
Boxplots show observed data for win–stay and lose–shift tendencies during the first block of the learning phase. Histograms to the right of each boxplot show the posterior distributions of the difference between sleep conditions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line) and the regions of practical equivalence (ROPE; red shading) around zero (dotted line), indicating no meaningful difference between sleep conditions. Bars above the histograms show Bayes factors with level of support for either hypothesis (BF₁₀, red; BF₀₁, grey) indicated by length of the bar and black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme evidence (BF >100) (Beard et al., 2016)

**FIGURE 4**
Boxplots of the observed test phase data for Choose A (positive feedback) and Avoid B (negative feedback). Histograms to the right of each boxplot show the posterior distributions of the difference between sleep conditions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line) and the regions of practical equivalence (ROPE; red shading) around zero (dotted line). There was no meaningful difference in generalised learning after sleep restriction. Bars above the histograms show Bayes factors with level of support for either hypothesis (BF₁₀, red; BF₀₁, grey) is indicated by the length of the bar and black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme evidence (BF >100) (Beard et al., 2016)

**FIGURE 5**
Posterior distributions of the computational model for the first learning block (left panel) and all learning blocks together (right panel). Boxplots show the estimated individual means drawn from the posterior distribution. Histograms to the right of each boxplot show the inverse probit transformed (φ) posterior distributions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line) and the regions of practical equivalence (ROPE; red shading) around zero (dotted line). Bars above the histograms show Bayes factors with level of support for either hypothesis (BF₁₀, red; BF₀₁, grey) indicated by length of the bar and black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme (BF >100) evidence (Beard et al., 2016)

See this image and copyright information in PMC

References

1. Ahn, W.‐Y., Haines, N., & Zhang, L. (2017). Revealing neurocomputational mechanisms of reinforcement learning and decision‐making with the hBayesDM package. Computational Psychiatry, 1, 24–57. - PMC - PubMed
1. Åkerstedt, T., & Gillberg, M. (1990). Subjective and objective sleepiness in the active individual. The International Journal of Neuroscience, 1, 29–37. - PubMed
1. Beard, E., Dienes, Z., Muirhead, C., & West, R. (2016). Using Bayes factors for testing hypotheses about intervention effectiveness in addictions research: Using Bayes factors for testing hypotheses about intervention effectiveness in addictions research. Addiction, 111, 2230–2247. - PMC - PubMed
1. Elmenhorst, D., Elmenhorst, E. M., Hennecke, E., Kroll, T., Matusch, A., Aeschbach, D., & Bauer, A. (2017). Recovery sleep after extended wakefulness restores elevated A1 adenosine receptor availability in the human brain. Proceedings of the National Academy of Sciences of the United States of America, 114, 4243–4248. - PMC - PubMed
1. Frank, M. J. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 306, 1940–1943. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction

Affiliations

Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous