Punishment Leads to Greater Sensorimotor Learning But Less Movement Variability Compared to Reward

Affiliations

¹ Department of Mechanical Engineering, University of Delaware, United States.
² Department of Biomedical Engineering, University of Delaware, United States.
³ Department of Kinesiology, McMaster University, Canada.
⁴ Department of Kinesiology, McMaster University, Canada. Electronic address: cartem11@mcmaster.ca.
⁵ Department of Mechanical Engineering, University of Delaware, United States; Department of Biomedical Engineering, University of Delaware, United States; Kinesiology and Applied Physiology, University of Delaware, United States; Interdisciplinary Neuroscience Graduate Program, University of Delaware, United States; Biomechanics and Movement Science Program, University of Delaware, United States; Department of Kinesiology, McMaster University, Canada. Electronic address: cashabackjga@gmail.com.

PMID: 38220127
PMCID: PMC10922623
DOI: 10.1016/j.neuroscience.2024.01.004

Punishment Leads to Greater Sensorimotor Learning But Less Movement Variability Compared to Reward

Adam M Roth et al. Neuroscience. 2024.

. 2024 Mar 5:540:12-26.

doi: 10.1016/j.neuroscience.2024.01.004. Epub 2024 Jan 12.

Authors

Affiliations

¹ Department of Mechanical Engineering, University of Delaware, United States.
² Department of Biomedical Engineering, University of Delaware, United States.
³ Department of Kinesiology, McMaster University, Canada.
⁴ Department of Kinesiology, McMaster University, Canada. Electronic address: cartem11@mcmaster.ca.
⁵ Department of Mechanical Engineering, University of Delaware, United States; Department of Biomedical Engineering, University of Delaware, United States; Kinesiology and Applied Physiology, University of Delaware, United States; Interdisciplinary Neuroscience Graduate Program, University of Delaware, United States; Biomechanics and Movement Science Program, University of Delaware, United States; Department of Kinesiology, McMaster University, Canada. Electronic address: cashabackjga@gmail.com.

PMID: 38220127
PMCID: PMC10922623
DOI: 10.1016/j.neuroscience.2024.01.004

Abstract

When a musician practices a new song, hitting a correct note sounds pleasant while striking an incorrect note sounds unpleasant. Such reward and punishment feedback has been shown to differentially influence the ability to learn a new motor skill. Recent work has suggested that punishment leads to greater movement variability, which causes greater exploration and faster learning. To further test this idea, we collected 102 participants over two experiments. Unlike previous work, in Experiment 1 we found that punishment did not lead to faster learning compared to reward (n = 68), but did lead to a greater extent of learning. Surprisingly, we also found evidence to suggest that punishment led to less movement variability, which was related to the extent of learning. We then designed a second experiment that did not involve adaptation, allowing us to further isolate the influence of punishment feedback on movement variability. In Experiment 2, we again found that punishment led to significantly less movement variability compared to reward (n = 34). Collectively our results suggest that punishment feedback leads to less movement variability. Future work should investigate whether punishment feedback leads to a greater knowledge of movement variability and or increases the sensitivity of updating motor actions.

Keywords: motor learning; movement variability; punishment; reinforcement; reward; sensorimotor adaptation.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Figure 1.. Experiment Design.**
**A, C)** In both Experiment 1 and 2, participants grasped the handle of a robotic manipulandum and made reaching movements in the horizontal plane. An LCD display projected images (start position, targets) onto a semi-silvered mirror that occluded vision of the hand and upper arm. A) The goal of Experiment 1 was to examine how reward feedback and punishment feedback influence sensorimotor adaptation and movement variability. Participants were instructed to reach from the start position (close white circle) and hit the target circle (far white circle). A long white bar positioned above the target disappeared after participants reached through it, signaling the end of the trial. Participants that experienced the reward landscape received reward feedback (pleasant sound, target expands, monetary reward) if they hit the target. Participants that experienced the punishment landscape received punishment feedback (unpleasant sound, target expands, monetary loss) if they missed the target. We recorded their reach angle (θ) on each trial. Reach angles were normalized to individual baseline movement variability and expressed as a z-score. B) Unbeknownst to participants, we directly manipulated the probability of receiving reward feedback or punishment feedback based (y-axis) on a participant’s normalized reach angle (z-score; x-axis) according to the assigned reward landscape (blue) or punishment landscape (red). Reward and punishment landscapes promote participants to change their reach angle to maximize success (θ^opt) by respectively maximizing positive reward or minimizing punishment. C) The goal of Experiment 2 was to examine how reward feedback and punishment feedback influenced movement variability, while mitigating the influence of adaptation. Accordingly, we used a motor task that did not require changes in average movement behaviour to successfully complete the task. Participants were told to reach from the start position (white circle) and stop anywhere within the virtually displayed target (white rectangle). D) Participants received only reward feedback in a block of experimental trials and only punishment feedback in the other block of experimental trials. In the reward block of experimental trials, participants received reward feedback if they successfully stopped within the target. In the punishment block of experimental trials (red), participants were told they would receive punishment feedback if they missed the target.

**Figure 2.. Sensorimotor Adaptation in Experiment 1.**
Here we show reach angle (y-axis) per trial (x-axis) for a single participant that experience the A) reward landscape (blue), a single participant that experienced the B) punishment landscape (red). Solid grey lines separate baseline trials (trials 1–50), experimental trials (trials 51–400) and washout trials (trials 401–450). Solid circles represent trials where participants received reward or punishment feedback. Unfilled circles represent trials where participants did not receive feedback. C) Here we show the average reach angle across participants for each group. Shaded areas represent ± 1 SE. Dashed horizontal lines (gray) represent the optimal reach angle (*θ°pt*) that maximizes reward or minimizes loss. D) We characterized learning rate, learning extent, and retention by calculating the average reach angle (y-axis) within the respective 50 trial block (x-axis): early learning (trials 51–100), late learning (trials 350–400), and washout (trials 401–450). Unfilled circles represent individual data. Solid grey circles represent mean reach angle for the group. Boxplots represent the 25th, 50th, and 75th percentiles. We found that participants experiencing the punishment landscape displayed significantly greater reach angles during the late learning block (p < 0.001), where this behaviour carried over to the washout block(p = 0.018). Our finding shows that punishment feedback leads to a greater extent of sensorimotor learning.

**Figure 3.. Movement Variability in Experiment 1.**
We calculated the standard deviation between changes in reach angle separately following successful (lighter shades) and unsuccessful (darker shades) trials to assess movement variability. A) We found no difference in trial-by-trial movement variability between reward feedback (blue) and punishment feedback (red) following a hit (p = 0.960) or miss (p = 0.696) in early learning trials. B) Likewise, we found no difference in trial-by-trial movement variability between reward feedback (blue) and punishment feedback (red) following a hit (p = 0.232) or miss (p = 0.250) in late learning trials C) Detrended reach angle (y-axis) per trial (x-axis) for the detrended moving average when using the reach angles shown in (A). Specifically, here we used a central moving average subtraction (15 trial bin size) to detrend individual data, to limit the influence of changes in reach aim due to adaptive behaviour. D) As a proxy of movement variability, we measured the standard deviation of the detrended reach angles separately for participants that experienced a reward landscape (blue) or a punishment landscape (red). With the detrended data (15 trial bin size), participants that experienced a punishment landscape displayed significantly lower movement variability in the late learning block than participants that experienced a reward landscape (p = 0.043). The inset displays the p-value (y-axis) when using different bin sizes (x-axis) for the moving average. All but one bin size was below a p-value of 0.1, with several below 0.05. These results do not support the hypothesis that punishment feedback leads to faster learning by increasing movement variability. E) We found a significant positive correlation (p = 0.020, ρ = 0.409) between trial-by-trial movement variability (x-axis) and average reach position (y-axis) during the late learning block of the punishment group. F) This monotonically increasing trend held when using the detrended reach angle (p = 0.033, ρ = 0 366). Note that the trend line shown in **E, F** are only visual and are not meant to suggest a linear relationship between average reach position and movement variability.

**Figure 4.. Movement Variability in Experiment 2.**
A) Successful (filled circle) and unsuccessful (unfilled circle) reaches by an individual participant performing the reward feedback (blue) and punishment feedback (red) conditions. B) Corresponding final hand position coordinates (y-axis) along the minor axis of the target for each trial (x-axis). C) We calculated movement variability in each condition separately following successful trials (Hit, dark colours) and unsuccessful trials (Miss, light colours). Final hand position was normalized to baseline and expressed as a z-score. We defined movement variability as the standard deviation of the trial-by-trial change in final hand position. Participants displayed significantly lower movement variability with punishment feedback (red) following either a hit (p = 0.016) or a miss (p = 0.022) compared to reward feedback (blue). D) We calculated the interquartile range (IQR) of final hand positions for each condition. Here we show the IQR ratio between conditions (y-axis). An IQR ratio greater than one (dashed grey line) indicates lower movement variability when given punishment feedback compared to reward feedback. Participants displayed significantly lower movement variability (p = 0.034) along the minor axis of the target with punishment feedback. E) Participants did not display differences in lag-1 autocorrelation between conditions (p = 0.197), suggesting that reward feedback and punishment feedback have a similar effect on sensorimotor exploration. Solid circles and connecting lines represent mean data for each condition. Hollow circles and connecting lines represent individual data. Box plots represent the 25th, 50th, and 75th percentiles. Taken together, these results suggest that punishment feedback suppresses movement variability.

See this image and copyright information in PMC

References

1. Abe M, Schambra H, Wassermann EM, Luckenbaugh D, Schweighofer N, & Cohen LG (2011). Reward Improves Long-Term Retention of a Motor Memory through Induction of Offline Memory Gains. Current Biology, 21 (7), 557–562. - PMC - PubMed
1. Acerbi L, Vijayakumar S, & Wolpert DM (2014). On the Origins of Suboptimality in Human Probabilistic Inference. PLOS Computational Biology, 10 (6), e1003661. - PMC - PubMed
1. Beers R. van (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63 (3), 406–417. - PubMed
1. Beers R. van, Brenner E, & Smeets J (2013). Random walk of motor planning in task-irrelevant dimensions. Journal of neurophysiology, 109 (4), 969–977. - PubMed
1. Beers R. van, Haggard P, & Wolpert D (2004). The role of execution noise in movement variability. Journal of neurophysiology, 91 (2), 1050–1063. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

U54 GM104941/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Punishment Leads to Greater Sensorimotor Learning But Less Movement Variability Compared to Reward

Affiliations

Punishment Leads to Greater Sensorimotor Learning But Less Movement Variability Compared to Reward

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources