. 2018 Jun;42 Suppl 3(Suppl Suppl 3):783-808.

doi: 10.1111/cogs.12599. Epub 2018 Mar 2.

Predictive Movements and Human Reinforcement Learning of Sequential Action

Roy de Kleijn¹, George Kachergis², Bernhard Hommel¹

Affiliations

PMID: 29498434
PMCID: PMC6001690
DOI: 10.1111/cogs.12599

Predictive Movements and Human Reinforcement Learning of Sequential Action

Roy de Kleijn et al. Cogn Sci. 2018 Jun.

. 2018 Jun;42 Suppl 3(Suppl Suppl 3):783-808.

doi: 10.1111/cogs.12599. Epub 2018 Mar 2.

Authors

Roy de Kleijn¹, George Kachergis², Bernhard Hommel¹

Affiliations

¹ Cognitive Psychology Unit, Leiden University.
² Department of Artificial Intelligence, Radboud University.

PMID: 29498434
PMCID: PMC6001690
DOI: 10.1111/cogs.12599

Abstract

Sequential action makes up the bulk of human daily activity, and yet much remains unknown about how people learn such actions. In one motor learning paradigm, the serial reaction time (SRT) task, people are taught a consistent sequence of button presses by cueing them with the next target response. However, the SRT task only records keypress response times to a cued target, and thus it cannot reveal the full time-course of motion, including predictive movements. This paper describes a mouse movement trajectory SRT task in which the cursor must be moved to a cued location. We replicated keypress SRT results, but also found that predictive movement-before the next cue appears-increased during the experiment. Moreover, trajectory analyses revealed that people developed a centering strategy under uncertainty. In a second experiment, we made prediction explicit, no longer cueing targets. Thus, participants had to explore the response alternatives and learn via reinforcement, receiving rewards and penalties for correct and incorrect actions, respectively. Participants were not told whether the sequence of stimuli was deterministic, nor if it would repeat, nor how long it was. Given the difficulty of the task, it is unsurprising that some learners performed poorly. However, many learners performed remarkably well, and some acquired the full 10-item sequence within 10 repetitions. Comparing the high- and low-performers' detailed results in this reinforcement learning (RL) task with the first experiment's cued trajectory SRT task, we found similarities between the two tasks, suggesting that the effects in Experiment 1 are due to predictive, rather than reactive processes. Finally, we found that two standard model-free reinforcement learning models fit the high-performing participants, while the four low-performing participants provide better fit with a simple negative recency bias model.

Keywords: Implicit motor learning; Movement trajectory; Reinforcement learning; Sequence learning; Sequential action; Serial reaction time task.

PubMed Disclaimer

Figures

**Figure 1**
Experiment 1 training phase RTs and error rates by block. (a) Mean of median RTs by block show that both conditions sped up over the course of Experiment 1, but that NB87 improved more. Error bars show +/−1SE. (b) Mean number of errors by block shows only the NB87 participants made an increasing number of errors. Error bars show +/−1SE.

**Figure 2**
Mean of median RT by sequence position during the early and late halves of training. Bars show +/−1SE.

**Figure 3**
Initial distance to target at target onset. Smaller values indicate movement toward the next target during the ISI (before the stimulus was visible). It is clear that participants in the NB87 condition show increased prediction over time. Error bars show +/−1 SE.

**Figure 4**
Characteristic movements in one trial from each condition. (a) One trial from one participant in the random condition, in which the next location was chosen at random, without repeats. All 11 random participants adopted a similar strategy of re‐centering the cursor after each response. This is optimal in the sense that it was impossible to know which location will be highlighted next (t ₀ = red, t _end = yellow). (b) A characteristic trial of a participant's movements during the NB87 sequence, beginning at location 4 (lower right) and ending at location 1 (upper left). These isomorphic trajectories can be compared for context effects. Only four NB87 participants showed centering movements in the last half of training.

**Figure 5**
Centering behavior during the ISI. (a) Proportion of time spent in the center of the screen, defined as a 100 × 100 pixel square in the center of the screen. Centering behavior in the random condition is clearly visible. Error bars show +/‐1SE. (b) Distribution of centering behavior for the last half of the experiment for the random condition. Two groups of participants can be identified: those who center during the ISI and those who do not.

**Figure 6**
Averaged trajectories for vertical movements 4‐2 and 3‐1 between 0 and 1,200 ms after ISI onset. (a) Horizontal deviation during movement (i.e., over time) in early training. Both conditions’ trajectories show some centering behavior, bending toward the middle (i.e., right for 3‐1, left for 4‐2). NB87 trajectories show less deviation. (b) Horizontal deviation during movement in late training. The random condition shows more centering behavior, while the NB87 trajectories show little variation except at the end of the movements when they diverge, showing prediction of the subsequent stimulus.

**Figure 7**
The histogram of participants’ final scores after completing 80 sequence repetitions (800 targets) shows a bimodal distribution (lines: elimination strategy EV = 0; perfect knowledge EV=800).

**Figure 8**
The mean of subjects’ median correct RTs by block shows that high‐performers’ (left panel) RTs improved more than the low‐performers’ (right panel) RTs over training. The mean of subjects’ median incorrect RTs by block shows that the high‐performing group's incorrect RTs actually increased, whereas the low‐performing group's stayed roughly the same across the experiment. Error bars show +/−1 SE.

**Figure 9**
RTs and error rates by median split and sequential position. (a) Mean of subjects’ median correct response times by median split and sequential position. The correct RTs for the two performance groups were not significantly correlated (r = 0.17, t(8)=0.48, p = 0.65). Error bars reflect +/‐1 SE. (b) The mean number of errors made at each position in the sequence split by performance group. The errors are highly correlated (r = .79, t(8)=3.68, p < .01), although note how much worse sequence position 5 was for the low‐performing group relative to the next‐worst position (8). Low‐performers showed twice as many errors in position 5 as in 8, while the high‐performing group showed only a 25% increase in errors. Error bars reflect +/−1 SE.

**Figure 10**
Scaled mean number of errors in Experiment 2 (RL) against scaled correct RTs from Experiment 1's cued SRT paradigm (NB87) by sequence position. The number of errors per position and the correct RTs are significantly correlated (r = 0.64, t(8) = 2.36, p < .05). Error bars show +/−1SE.

**Figure 11**
Overview of the experimental setup for the RL models. The plated components interact with each other according to the arrows to simulate the same trial‐and‐error learning process that humans undergo.

**Figure 12**
Relationship between individually optimized parameters for Q‐learning and participants’ final score. (a) The relation between learning rate α and final score attained by participants. Participants with a higher learning rate α attained higher final scores on the reinforcement learning task. (b) The relation between discount factor γ and final score attained by participants. Participants with a higher discount factor γ attained lower final scores on the reinforcement learning task.

See this image and copyright information in PMC

Cited by

A Critical Period for Robust Curriculum-Based Deep Reinforcement Learning of Sequential Action in a Robot Arm.
de Kleijn R, Sen D, Kachergis G. de Kleijn R, et al. Top Cogn Sci. 2022 Apr;14(2):311-326. doi: 10.1111/tops.12595. Epub 2022 Jan 10. Top Cogn Sci. 2022. PMID: 35005844 Free PMC article.
Implicit motor sequence learning using three-dimensional reaching movements with the non-dominant left arm.
Smith CR, Baird JF, Buitendorp J, Horton H, Watkins M, Stewart JC. Smith CR, et al. Exp Brain Res. 2024 Dec;242(12):2715-2726. doi: 10.1007/s00221-024-06934-4. Epub 2024 Oct 8. Exp Brain Res. 2024. PMID: 39377917 Free PMC article.
The impact of implicit and explicit suggestions that 'there is nothing to learn' on implicit sequence learning.
Vermeylen L, Abrahamse E, Braem S, Rigoni D. Vermeylen L, et al. Psychol Res. 2021 Jul;85(5):1943-1954. doi: 10.1007/s00426-020-01385-2. Epub 2020 Aug 4. Psychol Res. 2021. PMID: 32749535

References

1. Averbeck, B. B. , & Costa, V. D. (2017). Motivational neural circuits underlying reinforcement learning. Nature Neuroscience, 20, 505–512. - PubMed
1. Bornstein, A. M. , & Daw, N. D. (2012). Dissociating hippocampal and striatal contributions to dissociating hippocampal and striatal contributions to sequential prediction learning. European Journal of Neuroscience, 35, 1011–1023. - PMC - PubMed
1. Botvinick, M. , & Plaut, D. C. (2004). Doing without schema hierarchies: A recurrent connectionist approach to routine sequential action and its pathologies. Psychological Review, 111, 395–429. - PubMed
1. Boyer, M. , Destrebecqz, A. , & Cleeremans, A. (2005). Processing abstract sequence structure: Learning without knowing, or knowing without learning? Psychological Research, 69, 383–398. - PubMed
1. Bruhn, P. , Huette, S. , & Spivey, M. (2014). Degree of certainty modulates anticipatory processes in real time. Journal of Experimental Psychology: Human Perception and Performance, 40, 525–538. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predictive Movements and Human Reinforcement Learning of Sequential Action

Affiliations

Predictive Movements and Human Reinforcement Learning of Sequential Action

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources