Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 11;9(1):39-62.
doi: 10.5334/cpsy.101. eCollection 2025.

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour

Affiliations

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour

Angela Mariele Brands et al. Comput Psychiatr. .

Abstract

Processes formalized in classic Reinforcement Learning (RL) theory, such as model-based (MB) control, habit formation, and exploration have proven fertile in cognitive and computational neuroscience, as well as computational psychiatry. Dysregulations in MB control and exploration and their neurocomputational underpinnings play a key role across several psychiatric disorders. Yet, computational accounts mostly study these processes in isolation. The current study extended standard hybrid models of a widely-used sequential RL-task (two-step task; TST) employed to measure MB control. We implemented and compared different computational model extensions for this task to quantify potential exploration and perseveration mechanisms. In two independent data sets spanning two different variants of the task, an extended hybrid RL model with a higher-order perseveration and heuristic-based exploration mechanism provided the best fit. While a simpler model with complex perseveration only, was equally well equipped to describe the data, we found a robust positive effect of directed exploration on choice probabilities in stage one of the task. Posterior predictive checks further showed that the extended model reproduced choice patterns present in both data sets. Results are discussed with respect to implications for computational psychiatry and the search for neurocognitive endophenotypes.

Keywords: computational psychiatry; exploration; habits; higher-order perseveration; model-based; neurocomputational endophenotypes; two-step task.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

Outline of the two-step task (TST). Transition probabilities from the first stage to the second stage remain the same in both versions of the task. The second stage with a green frame depicts the modified task version employed in data set data1 (Mathar et al., 2022): after making a S2-choice subjects receive feedback in the form of continuous reward magnitudes (rounded to the next integer). The lower S2 stage (orange frame) depicts the classic version (used in data set data2; Gillan et al., 2016), in which the S2 feedback is presented in a binary fashion (rewarded vs. unrewarded based on fluctuating reward probabilities)
Figure 1
Outline of the two-step task (TST). Transition probabilities from the first stage to the second stage remain the same in both versions of the task. The second stage with a green frame depicts the modified task version employed in data set data1 (Mathar et al., 2022): after making a S2-choice subjects receive feedback in the form of continuous reward magnitudes (rounded to the next integer). The lower S2 stage (orange frame) depicts the classic version (used in data set data2; Gillan et al., 2016), in which the S2 feedback is presented in a binary fashion (rewarded vs. unrewarded based on fluctuating reward probabilities).
Stay-Probabilities of S1 choices and difference scores. Upper panel: MB and MF difference scores as defined by Eppinger et al. (2013), bar heights depict mean scores over all participants, error bars show the standard error. Lower panel: Probabilities for S1 choice repetition as a function of reward (rew+: rewarded; rew-: unrewarded) and transition type (common/rare) of the preceding trial. The left plots (green, A) show results from data1; the right plots (orange) show results from data2
Figure 2
Stay-Probabilities of S1 choices and difference scores. Upper panel: MB and MF difference scores as defined by Eppinger et al. (2013), bar heights depict mean scores over all participants, error bars show the standard error. Lower panel: Probabilities for S1 choice repetition as a function of reward (rew+: rewarded; rew-: unrewarded) and transition type (common/rare) of the preceding trial. The left plots (green, A) show results from data1; the right plots (orange) show results from data2.
Model Comparison Results via the Widely Applied Information Criterion (WAIC) for all Q Models (c.f. Table 1). The upper/lower panel (green/orange bar plots) refer to data1 and data2, respectively. Bandit/Trial refer to the model variants with added heuristic-based exploration bonus using stimulus identity/recency, respectively. HOP: model variants with higher order perseveration term; all other versions use a classic FOP term instead (Q, Q+BANDIT, Q+TRIAL)
Figure 3
Model Comparison Results via the Widely Applied Information Criterion (WAIC) for all Q Models (c.f. Table 1). The upper/lower panel (green/orange bar plots) refer to data1 and data2, respectively. Bandit/Trial refer to the model variants with added heuristic-based exploration bonus using stimulus identity/recency, respectively. HOP: model variants with higher order perseveration term; all other versions use a classic FOP term instead (Q, Q+BANDIT, Q+TRIAL).
Posterior Distributions of Group-Level Mean Parameters From Model Q + HOP. Solid gray lines show the 95% highest density interval (HDI) and dots depict the point-estimate of the mean. Panels A and B (green and orange plots) show results on the basis of data sets data1 and data2, respectively
Figure 4
Posterior Distributions of Group-Level Mean Parameters From Model Q + HOP. Solid gray lines show the 95% highest density interval (HDI) and dots depict the point-estimate of the mean. Panels A and B (green and orange plots) show results on the basis of data sets data1 and data2, respectively.
Probabilities of S1 choice repetition as a function of reward and transition type. Y-axis: Stay probabilities for 1st stage choices; data: empirical stay probabilities from data sets data1 (panel A; green) and data2 (panel B; orange). simulation: stay-probabilities from N = 8000 simulated choice sequences per subject, derived from the winning model (Q + HOP).; rew+/–: previous trial was rewarded (+) or unrewarded (–).; common/rare: previous trial followed a common/rare transition, respectively. Error bars in the simulation plots depict the 95% HDI over 8000 simulated data sets
Figure 5
Probabilities of S1 choice repetition as a function of reward and transition type. Y-axis: Stay probabilities for 1st stage choices; data: empirical stay probabilities from data sets data1 (panel A; green) and data2 (panel B; orange). simulation: stay-probabilities from N = 8000 simulated choice sequences per subject, derived from the winning model (Q + HOP).; rew+/–: previous trial was rewarded (+) or unrewarded (–).; common/rare: previous trial followed a common/rare transition, respectively. Error bars in the simulation plots depict the 95% HDI over 8000 simulated data sets.
Associations of model-agnostic and model-derived indices of MB and MF control for data1 (a) and data2 (b). Empty tiles (left panels) indicate non-significant associations. ßrew, ßtrans, ßrew:trans: regression weights for main effects of reward, transition type and their interaction; MBdiff, MFdiff: differences scores of MB and MF influences on S1 stay probabilities respectively; ßMB, ßMF: MB and MF S1 choice parameters from the winning model; ßHOP: S1 higher order perseveration parameter; mean reward: mean reward gained throughout TST (data1: 300 trials, data2: 200 trials). Right panel: association of model-derived MB (ßMB) and habit step-size parameter ∝HOP with mean reward. Circles depict individual participants. Plots in panel A (top row, green) are based on data1, plots in panel B (bottom row, orange) are based on data2
Figure 6
Associations of model-agnostic and model-derived indices of MB and MF control for data1 (a) and data2 (b). Empty tiles (left panels) indicate non-significant associations. ßrew, ßtrans, ßrew:trans: regression weights for main effects of reward, transition type and their interaction; MBdiff, MFdiff: differences scores of MB and MF influences on S1 stay probabilities respectively; ßMB, ßMF: MB and MF S1 choice parameters from the winning model; ßHOP: S1 higher order perseveration parameter; mean reward: mean reward gained throughout TST (data1: 300 trials, data2: 200 trials). Right panel: association of model-derived MB (ßMB) and habit step-size parameter ∝HOP with mean reward. Circles depict individual participants. Plots in panel A (top row, green) are based on data1, plots in panel B (bottom row, orange) are based on data2.
Posterior Density Estimates Based on The Full Sample of Data2. Group-level parameter estimates from model variant Q+HOP derived from fitting the full sample of the original publication (Gillan et al., 2016; N = 548; Experiment 1). The lower panel of Figure 4 shows corresponding results based on data2 (N = 100). Grey dots indicate the mean point-estimate, bars depict the 95%-HDI
Figure 7
Posterior Density Estimates Based on The Full Sample of Data2. Group-level parameter estimates from model variant Q+HOP derived from fitting the full sample of the original publication (Gillan et al., 2016; N = 548; Experiment 1). The lower panel of Figure 4 shows corresponding results based on data2 (N = 100). Grey dots indicate the mean point-estimate, bars depict the 95%-HDI.

References

    1. Adams, R. A., Huys, Q. J. M., & Roiser, J. P. (2016). Computational Psychiatry: Towards a mathematically informed understanding of mental illness. Journal of Neurology, Neurosurgery & Psychiatry, 87(1), 53–63. 10.1136/jnnp-2015-310737 - DOI - PMC - PubMed
    1. Addicott, M. A., Pearson, J. M., Sweitzer, M. M., Barack, D. L., & Platt, M. L. (2017). A Primer on Foraging and the Explore/Exploit Trade-Off for Psychiatry Research. Neuropsychopharmacology, 42(10), Article 10. 10.1038/npp.2017.108 - DOI - PMC - PubMed
    1. Addicott, M. A., Pearson, J. M., Wilson, J., Platt, M. L., & McClernon, F. J. (2013). Smoking and the bandit: A preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. Experimental and Clinical Psychopharmacology, 21, 66–73. 10.1037/a0030843 - DOI - PMC - PubMed
    1. Akam, T., Costa, R., & Dayan, P. (2015). Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task. PLOS Computational Biology, 11(12), e1004648. 10.1371/journal.pcbi.1004648 - DOI - PMC - PubMed
    1. Balleine, B., & O’Doherty, J. (2010). Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action. Neuropsychopharmacol, 35, 48–69. 10.1038/npp.2009.131 - DOI - PMC - PubMed

LinkOut - more resources