. 2016 Jun 20;12(6):e1004953.

doi: 10.1371/journal.pcbi.1004953. eCollection 2016 Jun.

The Computational Development of Reinforcement Learning during Adolescence

Stefano Palminteri^{1

2}, Emma J Kilford¹, Giorgio Coricelli^{3

4}, Sarah-Jayne Blakemore¹

Affiliations

¹ Institute of Cognitive Neuroscience, University College London, London, United Kingdom.
² Laboratoire de Neurosciences Cognitive, École Normale Supérieure, Paris, France.
³ Interdepartmental Centre for Mind/Brain Sciences, Università degli Studi di Trento, Trento, Italy.
⁴ Department of Economics, University of Southern California, Los Angeles, California, United States of America.

PMID: 27322574
PMCID: PMC4920542
DOI: 10.1371/journal.pcbi.1004953

The Computational Development of Reinforcement Learning during Adolescence

Stefano Palminteri et al. PLoS Comput Biol. 2016.

. 2016 Jun 20;12(6):e1004953.

doi: 10.1371/journal.pcbi.1004953. eCollection 2016 Jun.

Authors

Stefano Palminteri^{1

2}, Emma J Kilford¹, Giorgio Coricelli^{3

4}, Sarah-Jayne Blakemore¹

Affiliations

¹ Institute of Cognitive Neuroscience, University College London, London, United Kingdom.
² Laboratoire de Neurosciences Cognitive, École Normale Supérieure, Paris, France.
³ Interdepartmental Centre for Mind/Brain Sciences, Università degli Studi di Trento, Trento, Italy.
⁴ Department of Economics, University of Southern California, Los Angeles, California, United States of America.

PMID: 27322574
PMCID: PMC4920542
DOI: 10.1371/journal.pcbi.1004953

Abstract

Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Task design.**
**(A)** The learning task 2x2 factorial design. Different symbols were used as cues in each context, and symbol to context attribution was randomised across participants. The coloured frames are purely illustrative and represent each of the four context conditions throughout all figures. “Reward” = gain maximisation context; “Punishment” = loss minimisation context; “Partial”: counterfactual feedback was not provided; “Complete”: counterfactual feedback was provided; P_Gain = probability of gaining 1 point; P_Loss = probability of losing 1 point. **(B)** Time course of example trials in the Reward/Partial (top) and Reward/Complete (bottom) conditions. Durations are given in seconds. Fig 1 was adapted by the authors from a figure originally published in [15], licensed under CC BY 4.0.

**Fig 2. Computational models and ex-ante model simulations.**
**(A)** The schematic illustrates the computational architecture of the model space. For each context (or state, ‘s’), the agent tracks option values (Q(s,:)), which are used to decide amongst alternative courses of action. In all contexts, the agent is informed about the outcome corresponding to the chosen option (R(c)), which is used to update the chosen option value (Q(s,c)) via a prediction error (δ(c)). This computational module (“factual module”) requires a learning rate (α₁). In the presence of complete feedback, the agent is also informed about the outcome of the unchosen option (R(u)), which is used to update the unchosen option value (Q(s,u)) via a prediction error (δ(u)). This computational module (“counterfactual module”) requires a specific learning rate (α₂). In addition to tracking option value, the agent also tracks the value of the context (V(s)), which is also updated via a prediction error (δ(v)), integrating over all available feedback information (R(c) and, where applicable, R(u)). This computational module (“contextual module”) requires a specific learning rate (α₃). The full model (Model 3) can be reduced to Model 2 by suppressing the contextual module (i.e. assuming α₃ = 0). Model 2 can be reduced to Model 1 (basic Q-learning) by suppressing the counterfactual module (i.e. assuming α₂ = α₃ = 0). (B). Bars represent the model estimates of option values (top row) and decision values (bottom row), plotted as a function of the computational models and task contexts. G75 and G25: options associated with 75% and 25% chance of gaining a point, respectively; L75 and L25: options associated with 75% and 25% chance of losing a point, respectively. “Decision value” represents the difference in value between the correct and incorrect options (G75 minus G25 in Reward contexts; L25 minus L75 in Punishment contexts). Fig 2A was adapted by the authors from a figure originally published in [15], licensed under CC BY 4.0.

**Fig 3. Baseline model fitting and model comparison.**
**(A)** Choice inverse temperature (β) of each model for adults (dark grey) and adolescents (light grey). (B). Posterior probability (PP) of each model for adults (dark grey) and adolescents (light grey). The dotted line indicates chance level (0.33). ^#P<0.05; 2-sided, one-sample, t-tests; *P<0.001; 2-sided, independent samples, t-tests. Error bars represent s.e.m.

**Fig 4. Correct choice rate.**
**(A)** Learning curves in adolescents (left) and adults (right). The bold lines within the shaded areas represent the actual behavioural data (bold lines represent mean correct choice rate; shaded areas represent s.e.m). The behavioural data are superimposed with the ex-post model-simulated learning curves, estimated using parameters from each age group’s best fitting model (Model 1 for adolescents; Model 3 for adults). The dots represent the model-simulated mean correct choice probabilities. (B) Bars represent the correct choice rate improvement (difference in correct choice rate between last and first trials) and the final correct choice rate (last trial) in Reward (leftmost panel) and Punishment (rightmost panel) contexts. Chance level (i.e. no learning) is 0.0 for correct choice rate improvement, and 0.5 for final correct choice rate. Error bars represent s.e.m. *P<0.05: independent samples t-test (2-sided).

**Fig 5. Reaction times.**
**(A)** Reaction time (RT) curves in adolescents (left) and adults (right). The bold lines within the shaded areas represent the mean RT. The shaded areas represent the s.e.m. (B) Bars represent the RT reduction (difference in RT between last and first trials) and the final RT (last trial) in the Partial (leftmost panel) and the Complete (rightmost panel) feedback contexts. Error bars represent the s.e.m. *P<0.05: independent-samples t-test (2-sided).

**Fig 6. Post-learning test.**
**(A)** Bars represent the choice rate observed in the post-learning test, in adolescents (left) and adults (right). G75 and G25: options associated with 75% and 25% chance of gaining a point, respectively; L75 and L25: options associated with 75% and 25% chance of losing a point, respectively. The behavioural data are superimposed with coloured dots representing the model-simulated post-learning choices, estimated using parameters from each age group’s best fitting model (Model 1 for adolescents; Model 3 for adults). **(B)** Bars represent cue discrimination, the difference between post-learning choice-rates for Correct vs. Incorrect cues (G75 minus G25 in Reward contexts; L25 minus L75 in Punishment contexts), in Partial (leftmost panel) and Complete (rightmost panel) contexts. Chance level (i.e. no cue discrimination) is 0.0. Error bars represent s.e.m. *P<0.05: independent samples t-test (2-sided).

See this image and copyright information in PMC

Cited by

The interpretation of computational model parameters depends on the context.
Eckstein MK, Master SL, Xia L, Dahl RE, Wilbrecht L, Collins AGE. Eckstein MK, et al. Elife. 2022 Nov 4;11:e75474. doi: 10.7554/eLife.75474. Elife. 2022. PMID: 36331872 Free PMC article.
Reinforcement Learning during Adolescence in Rats.
Moin Afshar N, Keip AJ, Taylor JR, Lee D, Groman SM. Moin Afshar N, et al. J Neurosci. 2020 Jul 22;40(30):5857-5870. doi: 10.1523/JNEUROSCI.0910-20.2020. Epub 2020 Jun 29. J Neurosci. 2020. PMID: 32601244 Free PMC article.
Anhedonia in Relation to Reward and Effort Learning in Young People with Depression Symptoms.
Frey AL, Kaya MS, Adeniyi I, McCabe C. Frey AL, et al. Brain Sci. 2023 Feb 17;13(2):341. doi: 10.3390/brainsci13020341. Brain Sci. 2023. PMID: 36831884 Free PMC article.
Effects of three prophylactic interventions on French middle-schoolers' mental health: protocol for a randomized controlled trial.
Vaillant-Coindard E, Briet G, Lespiau F, Gisclard B, Charbonnier E. Vaillant-Coindard E, et al. BMC Psychol. 2024 Apr 13;12(1):204. doi: 10.1186/s40359-024-01723-8. BMC Psychol. 2024. PMID: 38615007 Free PMC article.
What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience.
Eckstein MK, Wilbrecht L, Collins AGE. Eckstein MK, et al. Curr Opin Behav Sci. 2021 Oct;41:128-137. doi: 10.1016/j.cobeha.2021.06.004. Epub 2021 Jul 3. Curr Opin Behav Sci. 2021. PMID: 34984213 Free PMC article.

See all "Cited by" articles

References

1. Steinberg L. Cognitive and affective development in adolescence. Trends Cogn Sci. 2005;9: 69–74. - PubMed
1. Blakemore S-J, Robbins TW. Decision-making in the adolescent brain. Nat Neurosci. Nature Publishing Group; 2012;15: 1184–91. - PubMed
1. Sercombe H. Risk, adaptation and the functional teenage brain. Brain Cogn. 2014;89: 61–9. 10.1016/j.bandc.2014.01.001 - DOI - PubMed
1. Willoughby T, Good M, Adachi PJC, Hamza C, Tavernier R. Brain and Cognition Examining the link between adolescent brain development and risk taking from a social—developmental perspective. Brain Cogn. Elsevier Inc.; 2013;83: 315–323. - PubMed
1. Blakemore S-J, Mills KL. Is adolescence a sensitive period for sociocultural processing? Annu Rev Psychol. 2014;65: 187–207. 10.1146/annurev-psych-010213-115202 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Computational Development of Reinforcement Learning during Adolescence

Affiliations

The Computational Development of Reinforcement Learning during Adolescence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical