Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;30(4):e13236.
doi: 10.1111/jsr.13236. Epub 2020 Nov 20.

Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction

Affiliations

Does insufficient sleep affect how you learn from reward or punishment? Reinforcement learning after 2 nights of sleep restriction

Andreas Gerhardsson et al. J Sleep Res. 2021 Aug.

Abstract

To learn from feedback (trial and error) is essential for all species. Insufficient sleep has been found to reduce the sensitivity to feedback as well as increase reward sensitivity. To determine whether insufficient sleep alters learning from positive and negative feedback, healthy participants (n = 32, mean age 29.0 years, 18 women) were tested once after normal sleep (8 hr time in bed for 2 nights) and once after 2 nights of sleep restriction (4 hr/night) on a probabilistic selection task where learning behaviour was evaluated in three ways: as generalised learning, short-term win-stay/lose-shift learning strategies, and trial-by-trial learning rate. Sleep restriction did not alter the sensitivity to either positive or negative feedback on generalised learning. Also, short-term win-stay/lose-shift strategies were not affected by sleep restriction. Similarly, results from computational models that assess the trial-by-trial update of stimuli value demonstrated no difference between sleep conditions after the first block. However, a slower learning rate from negative feedback when evaluating all learning blocks was found after sleep restriction. Despite a marked increase in sleepiness and slowed learning rate for negative feedback, sleep restriction did not appear to alter strategies and generalisation of learning from positive or negative feedback.

Keywords: carrot or stick; feedback-based learning; lack of sleep; reward or punishment; sleep deprivation; valanced feedback.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

FIGURE 1
FIGURE 1
(a) Symbol pairs in the training and the two learning sets together with winning probability within each pair. Learning criteria for Set 1 and Set 2 were ≥65% A choices for A/B, ≥60% C choices for C/D, and ≥40% E choices for E/F after each block. (b) Trial example from learning phase. In the test phase (not depicted) no feedback was given and symbol pairs were scrambled
FIGURE 2
FIGURE 2
Boxplots show observed sleepiness ratings according to the Karolinska Sleepiness Scale (KSS; Åkerstedt & Gillberg, 1990), and subjective stress ratings (Schwarz et al., 2018) for the normal and restricted sleep conditions. Histograms to the right of each boxplot show the posterior distributions of the difference between sleep conditions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line), and the regions of practical equivalence (ROPE; red shading) around zero (dotted line). Sleepiness increased strongly but an increase in stress after sleep restriction was not large enough to be conclusively separated from the ROPE. Bars above the histograms show Bayes factors with level of support for either hypothesis (BF10, red; BF01, grey) indicated by length of the bar; black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme evidence (BF >100) (Beard et al., 2016)
FIGURE 3
FIGURE 3
Boxplots show observed data for win–stay and lose–shift tendencies during the first block of the learning phase. Histograms to the right of each boxplot show the posterior distributions of the difference between sleep conditions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line) and the regions of practical equivalence (ROPE; red shading) around zero (dotted line), indicating no meaningful difference between sleep conditions. Bars above the histograms show Bayes factors with level of support for either hypothesis (BF10, red; BF01, grey) indicated by length of the bar and black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme evidence (BF >100) (Beard et al., 2016)
FIGURE 4
FIGURE 4
Boxplots of the observed test phase data for Choose A (positive feedback) and Avoid B (negative feedback). Histograms to the right of each boxplot show the posterior distributions of the difference between sleep conditions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line) and the regions of practical equivalence (ROPE; red shading) around zero (dotted line). There was no meaningful difference in generalised learning after sleep restriction. Bars above the histograms show Bayes factors with level of support for either hypothesis (BF10, red; BF01, grey) is indicated by the length of the bar and black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme evidence (BF >100) (Beard et al., 2016)
FIGURE 5
FIGURE 5
Posterior distributions of the computational model for the first learning block (left panel) and all learning blocks together (right panel). Boxplots show the estimated individual means drawn from the posterior distribution. Histograms to the right of each boxplot show the inverse probit transformed (φ) posterior distributions with highest density intervals (HDI; thick black horizontal line), highest maximum a posteriori probability estimates (MAP; grey solid vertical line) and the regions of practical equivalence (ROPE; red shading) around zero (dotted line). Bars above the histograms show Bayes factors with level of support for either hypothesis (BF10, red; BF01, grey) indicated by length of the bar and black lines indicate thresholds for moderate (BF >3), strong (BF >10), and extreme (BF >100) evidence (Beard et al., 2016)

Similar articles

References

    1. Ahn, W.‐Y., Haines, N., & Zhang, L. (2017). Revealing neurocomputational mechanisms of reinforcement learning and decision‐making with the hBayesDM package. Computational Psychiatry, 1, 24–57. - PMC - PubMed
    1. Åkerstedt, T., & Gillberg, M. (1990). Subjective and objective sleepiness in the active individual. The International Journal of Neuroscience, 1, 29–37. - PubMed
    1. Beard, E., Dienes, Z., Muirhead, C., & West, R. (2016). Using Bayes factors for testing hypotheses about intervention effectiveness in addictions research: Using Bayes factors for testing hypotheses about intervention effectiveness in addictions research. Addiction, 111, 2230–2247. - PMC - PubMed
    1. Elmenhorst, D., Elmenhorst, E. M., Hennecke, E., Kroll, T., Matusch, A., Aeschbach, D., & Bauer, A. (2017). Recovery sleep after extended wakefulness restores elevated A1 adenosine receptor availability in the human brain. Proceedings of the National Academy of Sciences of the United States of America, 114, 4243–4248. - PMC - PubMed
    1. Frank, M. J. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 306, 1940–1943. - PubMed