Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun;27(6):848-58.
doi: 10.1177/0956797616639301. Epub 2016 Apr 15.

From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning

Affiliations

From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning

Johannes H Decker et al. Psychol Sci. 2016 Jun.

Abstract

Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior.

Keywords: cognitive development; decision making; open data; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.

Figures

Fig. 1.
Fig. 1.
Design of the sequential spaceship task (a) and idealized model-free and model-based behavior (b). On each trial, participants chose between two spaceships (first-stage choice), which was followed by a probabilistic transition to a red planet or a purple planet. Then participants chose between two aliens (second-stage choice) and were rewarded with space treasure or not. The probability of winning space treasure is presented as a function of trial for each alien. The bar graphs show, for idealized model-free and model-based learners, the probability of making the same choice on the next trial (i.e., a first-stage stay) as a function of the outcome and transition type (common or rare) of the previous trial.
Fig. 2.
Fig. 2.
First-stage choice behavior by age. The proportion of first-stage stay choices is graphed as a function of outcome of the previous trial for each age group (a), separately for trials following common and rare transitions. The error bars represent ±1 SEM. The scatterplot (with best-fitting regression line) shows the relationship between the reward-by-transition-type interaction effect (the model-based effect estimates) and age (b). The model-based effect is plotted as the fixed plus the random effects from a regression model with age excluded. The gray area represents ±1 SEM.
Fig. 3.
Fig. 3.
Second-stage response time results. The bar graphs (a) show response times for choices at the second stage as a function of the preceding transition type for each age group. Error bars represent ±1 SEM. The scatterplots (b; with best-fitting regression lines) show the relationship between the reward-by-transition-type interaction effect (the model-based effect estimates) and the difference in response time between choices following rare transitions and those following common transitions for each age group. The gray bands represent ±1 SEM.

References

    1. Balleine B. W., O’Doherty J. P. (2009). Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology, 35, 48–69. - PMC - PubMed
    1. Bates D., Maechler M., Bolker B., Walker S. (2015). lme4: Linear mixed-effects models using Eigen and S4 (Version 1.1-8) [Software]. Retrieved from http://cran.r-project.org/package=lme4
    1. Braver T. S. (2012). The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences, 16, 106–113. doi:10.1016/j.tics.2011.12.010 - DOI - PMC - PubMed
    1. Bunge S. A., Zelazo P. D. (2006). A brain-based account of the development of rule use in childhood. Current Directions in Psychological Science, 15, 118–121. doi:10.1111/j.0963-7214.2006.00419.x - DOI
    1. Chatham C. H., Frank M. J., Munakata Y. (2009). Pupillometric and behavioral markers of a developmental shift in the temporal dynamics of cognitive control. Proceedings of the National Academy of Sciences, USA, 106, 5529–5533. doi:10.1073/pnas.0810002106 - DOI - PMC - PubMed

LinkOut - more resources