Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 20:10:234.
doi: 10.3389/fnbeh.2016.00234. eCollection 2016.

Slips of Action and Sequential Decisions: A Cross-Validation Study of Tasks Assessing Habitual and Goal-Directed Action Control

Affiliations

Slips of Action and Sequential Decisions: A Cross-Validation Study of Tasks Assessing Habitual and Goal-Directed Action Control

Zsuzsika Sjoerds et al. Front Behav Neurosci. .

Abstract

Instrumental learning and decision-making rely on two parallel systems: a goal-directed and a habitual system. In the past decade, several paradigms have been developed to study these systems in animals and humans by means of e.g., overtraining, devaluation procedures and sequential decision-making. These different paradigms are thought to measure the same constructs, but cross-validation has rarely been investigated. In this study we compared two widely used paradigms that assess aspects of goal-directed and habitual behavior. We correlated parameters from a two-step sequential decision-making task that assesses model-based (MB) and model-free (MF) learning with a slips-of-action paradigm that assesses the ability to suppress cue-triggered, learnt responses when the outcome has been devalued and is therefore no longer desirable. MB control during the two-step task showed a very moderately positive correlation with goal-directed devaluation sensitivity, whereas MF control did not show any associations. Interestingly, parameter estimates of MB and goal-directed behavior in the two tasks were positively correlated with higher-order cognitive measures (e.g., visual short-term memory). These cognitive measures seemed to (at least partly) mediate the association between MB control during sequential decision-making and goal-directed behavior after instructed devaluation. This study provides moderate support for a common framework to describe the propensity towards goal-directed behavior as measured with two frequently used tasks. However, we have to caution that the amount of shared variance between the goal-directed and MB system in both tasks was rather low, suggesting that each task does also pick up distinct aspects of goal-directed behavior. Further investigation of the commonalities and differences between the MF and habit systems as measured with these, and other, tasks is needed. Also, a follow-up cross-validation on the neural systems driving these constructs across different paradigms would promote the definition and operationalization of measures of instrumental learning and decision-making in humans.

Keywords: cross-validation; goal-directed; habit; model-based; model-free; reinforcement learning; sequential decision making; slips-of-action.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experimental paradigm, three-phase instrumental learning task. (A) Discrimination training phase. In this example, a flamingo stimulus printed on the front of a closed box indicates that pressing the right key will open the box and will be rewarded with a donkey and points inside of the box. Pressing the left key will not be rewarded (empty open box is revealed). (B) Outcome-devaluation test. In this example, two open boxes are presented with a donkey and fish inside. The cross superimposed on the fish signals this outcome is no longer worth any points. The accurate response in this example would be pressing the right key (which yielded the still-valuable donkey outcome during the learning phase). (C) Slips-of-action test. (1) Participants are first presented with the six outcomes. In this example, donkey and cow are superimposed with a cross, indicating that the response leading to these outcomes will now result in subtraction of points (devaluation). The other animal outcomes are still valuable. (2) Afterwards, in rapid succession animal stimuli are presented on the outside of the boxes. Participants are instructed to press the correct key if a stimulus indicates the availability of a still-valuable outcome inside the box (“Go”, example: polar bear stimulus signaling fish outcome), but withhold responding if the outcome inside the box has been devalued (“No-Go”, example: flamingo stimulus signaling donkey outcome).
Figure 2
Figure 2
Experimental paradigm, two-step sequential decision-making task. (A) Example of a trial-sequence with timing; (B) state-transition probabilities indicating common and rare transitions; (C) hypothetical full model-free (MF) and full model-based (MB) choice strategies would result in these choice patterns. Depicted here, stay probability plots for first step choices as a function of reward (reward vs. no reward) and state (common vs. rare). A main effect of reward guides MF choice strategies, whereas MB choice strategies show a reward * state interaction.
Figure 3
Figure 3
Results of the Instrumental learning task. (A) Instrumental discrimination training phase, displayed as learning over eight blocks. By the eighth block, all participants had learned the correct S-R-O contingencies significantly above chance level. (B) Average percentage correct responses on the (total) discrimination training phase (left) and outcome devaluation phase (right). (C) Percentage responses to still valuable and devalued trials of the slips-of-action phase (left) and the baseline test phase (right). Error bars: 95% confidence interval.
Figure 4
Figure 4
Stay probabilities in the two-step sequential decision-making task. Stay probabilities in the two-step sequential decision-making task show a reward by state interaction. Error bars: 95% confidence interval.
Figure 5
Figure 5
Scatterplots of MB and MF choice values of the two-step task and devaluation sensitivity of the slips-of-action paradigm. (A) A positive correlation is seen between the MB parameter of the two-step task (βMB) and the devaluation sensitivity index (DSI) of the slips-of-action task: ρ(25) = 0.431, R2 = 0.055, p = 0.016. (B) No significant association is seen between the MF parameter of the two-step task (βMF) and the DSI: ρ(25) = 0.172, R2 = 0.003, p = 0.205. Dotted lines: 95% confidence interval.

References

    1. Abe H., Lee D. (2011). Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70, 731–741. 10.1016/j.neuron.2011.03.026 - DOI - PMC - PubMed
    1. Adams C. D., Dickinson A. (1981). Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121. 10.1080/14640748108400816 - DOI
    1. Balleine B. W., Dickinson A. (1998a). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419. 10.1016/s0028-3908(98)00033-1 - DOI - PubMed
    1. Balleine B. W., Dickinson A. (1998b). The role of incentive learning in instrumental outcome revaluation by sensory-specific satiety. Anim. Learn. Behav. 26, 46–59. 10.3758/bf03199161 - DOI
    1. Balleine B. W., O’Doherty J. P. (2010). Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69. 10.1038/npp.2009.131 - DOI - PMC - PubMed

LinkOut - more resources