Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun;42 Suppl 3(Suppl Suppl 3):783-808.
doi: 10.1111/cogs.12599. Epub 2018 Mar 2.

Predictive Movements and Human Reinforcement Learning of Sequential Action

Affiliations

Predictive Movements and Human Reinforcement Learning of Sequential Action

Roy de Kleijn et al. Cogn Sci. 2018 Jun.

Abstract

Sequential action makes up the bulk of human daily activity, and yet much remains unknown about how people learn such actions. In one motor learning paradigm, the serial reaction time (SRT) task, people are taught a consistent sequence of button presses by cueing them with the next target response. However, the SRT task only records keypress response times to a cued target, and thus it cannot reveal the full time-course of motion, including predictive movements. This paper describes a mouse movement trajectory SRT task in which the cursor must be moved to a cued location. We replicated keypress SRT results, but also found that predictive movement-before the next cue appears-increased during the experiment. Moreover, trajectory analyses revealed that people developed a centering strategy under uncertainty. In a second experiment, we made prediction explicit, no longer cueing targets. Thus, participants had to explore the response alternatives and learn via reinforcement, receiving rewards and penalties for correct and incorrect actions, respectively. Participants were not told whether the sequence of stimuli was deterministic, nor if it would repeat, nor how long it was. Given the difficulty of the task, it is unsurprising that some learners performed poorly. However, many learners performed remarkably well, and some acquired the full 10-item sequence within 10 repetitions. Comparing the high- and low-performers' detailed results in this reinforcement learning (RL) task with the first experiment's cued trajectory SRT task, we found similarities between the two tasks, suggesting that the effects in Experiment 1 are due to predictive, rather than reactive processes. Finally, we found that two standard model-free reinforcement learning models fit the high-performing participants, while the four low-performing participants provide better fit with a simple negative recency bias model.

Keywords: Implicit motor learning; Movement trajectory; Reinforcement learning; Sequence learning; Sequential action; Serial reaction time task.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experiment 1 training phase RTs and error rates by block. (a) Mean of median RTs by block show that both conditions sped up over the course of Experiment 1, but that NB87 improved more. Error bars show +/−1SE. (b) Mean number of errors by block shows only the NB87 participants made an increasing number of errors. Error bars show +/−1SE.
Figure 2
Figure 2
Mean of median RT by sequence position during the early and late halves of training. Bars show +/−1SE.
Figure 3
Figure 3
Initial distance to target at target onset. Smaller values indicate movement toward the next target during the ISI (before the stimulus was visible). It is clear that participants in the NB87 condition show increased prediction over time. Error bars show +/−1 SE.
Figure 4
Figure 4
Characteristic movements in one trial from each condition. (a) One trial from one participant in the random condition, in which the next location was chosen at random, without repeats. All 11 random participants adopted a similar strategy of re‐centering the cursor after each response. This is optimal in the sense that it was impossible to know which location will be highlighted next (t 0 = red, t end = yellow). (b) A characteristic trial of a participant's movements during the NB87 sequence, beginning at location 4 (lower right) and ending at location 1 (upper left). These isomorphic trajectories can be compared for context effects. Only four NB87 participants showed centering movements in the last half of training.
Figure 5
Figure 5
Centering behavior during the ISI. (a) Proportion of time spent in the center of the screen, defined as a 100 × 100 pixel square in the center of the screen. Centering behavior in the random condition is clearly visible. Error bars show +/‐1SE. (b) Distribution of centering behavior for the last half of the experiment for the random condition. Two groups of participants can be identified: those who center during the ISI and those who do not.
Figure 6
Figure 6
Averaged trajectories for vertical movements 4‐2 and 3‐1 between 0 and 1,200 ms after ISI onset. (a) Horizontal deviation during movement (i.e., over time) in early training. Both conditions’ trajectories show some centering behavior, bending toward the middle (i.e., right for 3‐1, left for 4‐2). NB87 trajectories show less deviation. (b) Horizontal deviation during movement in late training. The random condition shows more centering behavior, while the NB87 trajectories show little variation except at the end of the movements when they diverge, showing prediction of the subsequent stimulus.
Figure 7
Figure 7
The histogram of participants’ final scores after completing 80 sequence repetitions (800 targets) shows a bimodal distribution (lines: elimination strategy EV = 0; perfect knowledge EV=800).
Figure 8
Figure 8
The mean of subjects’ median correct RTs by block shows that high‐performers’ (left panel) RTs improved more than the low‐performers’ (right panel) RTs over training. The mean of subjects’ median incorrect RTs by block shows that the high‐performing group's incorrect RTs actually increased, whereas the low‐performing group's stayed roughly the same across the experiment. Error bars show +/−1 SE.
Figure 9
Figure 9
RTs and error rates by median split and sequential position. (a) Mean of subjects’ median correct response times by median split and sequential position. The correct RTs for the two performance groups were not significantly correlated (r = 0.17, t(8)=0.48, p = 0.65). Error bars reflect +/‐1 SE. (b) The mean number of errors made at each position in the sequence split by performance group. The errors are highly correlated (r = .79, t(8)=3.68, p < .01), although note how much worse sequence position 5 was for the low‐performing group relative to the next‐worst position (8). Low‐performers showed twice as many errors in position 5 as in 8, while the high‐performing group showed only a 25% increase in errors. Error bars reflect +/−1 SE.
Figure 10
Figure 10
Scaled mean number of errors in Experiment 2 (RL) against scaled correct RTs from Experiment 1's cued SRT paradigm (NB87) by sequence position. The number of errors per position and the correct RTs are significantly correlated (r = 0.64, t(8) = 2.36, p < .05). Error bars show +/−1SE.
Figure 11
Figure 11
Overview of the experimental setup for the RL models. The plated components interact with each other according to the arrows to simulate the same trial‐and‐error learning process that humans undergo.
Figure 12
Figure 12
Relationship between individually optimized parameters for Q‐learning and participants’ final score. (a) The relation between learning rate α and final score attained by participants. Participants with a higher learning rate α attained higher final scores on the reinforcement learning task. (b) The relation between discount factor γ and final score attained by participants. Participants with a higher discount factor γ attained lower final scores on the reinforcement learning task.

Similar articles

Cited by

References

    1. Averbeck, B. B. , & Costa, V. D. (2017). Motivational neural circuits underlying reinforcement learning. Nature Neuroscience, 20, 505–512. - PubMed
    1. Bornstein, A. M. , & Daw, N. D. (2012). Dissociating hippocampal and striatal contributions to dissociating hippocampal and striatal contributions to sequential prediction learning. European Journal of Neuroscience, 35, 1011–1023. - PMC - PubMed
    1. Botvinick, M. , & Plaut, D. C. (2004). Doing without schema hierarchies: A recurrent connectionist approach to routine sequential action and its pathologies. Psychological Review, 111, 395–429. - PubMed
    1. Boyer, M. , Destrebecqz, A. , & Cleeremans, A. (2005). Processing abstract sequence structure: Learning without knowing, or knowing without learning? Psychological Research, 69, 383–398. - PubMed
    1. Bruhn, P. , Huette, S. , & Spivey, M. (2014). Degree of certainty modulates anticipatory processes in real time. Journal of Experimental Psychology: Human Perception and Performance, 40, 525–538. - PubMed

Publication types

LinkOut - more resources