Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;15(3):523-36.
doi: 10.3758/s13415-015-0347-6.

Model-based learning protects against forming habits

Affiliations

Model-based learning protects against forming habits

Claire M Gillan et al. Cogn Affect Behav Neurosci. 2015 Sep.

Abstract

Studies in humans and rodents have suggested that behavior can at times be "goal-directed"-that is, planned, and purposeful-and at times "habitual"-that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants' subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Experiment 1: Reinforcement-learning task. Participants entered one of two start states on each trial, which were associated with the receipt of gold and silver coins, each worth 25¢. Participants had 2.5 seconds (s) to make a choice, costing 1¢, which would commonly (70 %) lead them to a certain second state and rarely lead them to the alternative second state (30 %). No choices were made to the second state; each second state has a unique probability of reward that slowly changed over the course of the experiment. (B) Graph depicting a purely model-free learner, whose behavior is solely predicted by reinforcement history. (C) A purely model-based learner’s behavior, in contrast, is predicted by an interaction between reward and transition, such that behavior would mirror the model-free learner only when the transition from the initial choice to the outcome was common. Following rare transitions, a purely model-free learner would show the reverse pattern
Fig. 2
Fig. 2
Experiment 1: Devaluation and consumption tests. (A) The 24-trial devaluation stage consisted of presentations of the first-stage choices only; that is, participants did not transition to the second stages and never learned the outcomes of their choices. This ensured that responding during the devaluation test was dependent only on prior learning. They were informed that the task would continue as before, but that they would no longer be shown the results of their choices. (B) After four trials of experience with the concealed trial outcomes, one type of coin was devalued by informing participants that the corresponding container was completely full. (C) This trial was followed by a consumption test, in which participants had 4 s to freely collect coins using their mouse. Next they completed the 20 test trials, in which habits were quantified as the difference between the numbers of responses made to the valued and devalued states
Fig. 3
Fig. 3
Experiment 1: Effect sizes (beta weights) from the logistic regression model (Table 1). Significant effects were observed for reward (model-free, p < .001), the Reward × Transition interaction (model-based, p = .020), and the predicted three-way interaction of reward, transition, and devaluation sensitivity (p = .003). rew = reward, trans = transition, dev = devaluation sensitivity
Fig. 4
Fig. 4
Experiment 1: Model-based learning and habit formation. (A) Histogram displaying devaluation sensitivity in the entire sample in Experiment 1. Devaluation sensitivity is defined as the difference between the numbers of valued and devalued responses performed in the test stage, with larger numbers indicating greater sensitivity to devaluation. To illustrate the relationship between model-based learning and habit formation, a median split divides the sample into (B) habit (devaluation sensitivity < 1) and (C) goal-directed (devaluation sensitivity > 1) groups. Those who displayed habits at test showed a marked absence of the signature of model-based learning, p < .003
Fig. 5
Fig. 5
Experiment 2: Reinforcement-learning task. (A) Participants entered the same starting state on each trial and had 2.5 s to make a choice between two fractal stimuli that always appeared in this state. One fractal commonly (70 %) led to one of the second-stage states and rarely (30 %) led to the other. In contrast to Experiment 1, each second-stage state was uniquely associated with a certain type of coin (gold or silver). (B) For the first 150 trials, reward probabilities (the chance of winning a coin in a given second-stage state) drifted slowly over time according to Gaussian random walks. For the next 50 trials, the reward probabilities stabilized at .9 and .1, for the second-stage states associated with the to-be-devalued and to-remain-valued outcomes, respectively. This served to systematically bias all participants toward making the action that would later be devalued. Devaluation was randomized across coin colors and reward drifts
Fig. 6
Fig. 6
Experiment 2: Model-based learning and habit formation. (A) Histogram displaying devaluation sensitivity in the entire sample from Experiment 2. Here, devaluation sensitivity is defined as the proportion of valued choices (over total choices) made at the test stage, with larger numbers indicating greater sensitivity to devaluation. To illustrate the relationship between model-based learning and habit formation, a median split divides the sample into (B) habit (devaluation sensitivity < .6) and (C) goal-directed (devaluation sensitivity > .6) groups. Consistent with Experiment 1, the participants who displayed habits in Experiment 2 (i.e. failed to prefer valued over devalued choices) showed a reduction in the signature of model-based learning, p < .001

References

    1. Adams CD. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1982;34B:77–98. doi: 10.1080/14640748208400878. - DOI
    1. Adams CD, Dickinson A. Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology. 1981;33B:109–121. doi: 10.1080/14640748108400816. - DOI
    1. Akam, T., Dayan, P., & Costa, R. (2013). Multi-step decision tasks for dissociating model-based and model-free learning in rodents. Paper presented at the Cosyne 2013, Salt Lake City, UT.
    1. Balleine BW, Dickinson A. Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/S0028-3908(98)00033-1. - DOI - PubMed
    1. Balleine BW, O’Doherty JP. Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. doi: 10.1038/npp.2009.131. - DOI - PMC - PubMed

Publication types

LinkOut - more resources