. 2024 Mar 15;11(3):ENEURO.0383-23.2024.

doi: 10.1523/ENEURO.0383-23.2024. Print 2024 Mar.

Reinforcement Learning during Locomotion

Jonathan M Wood^{1

2}, Hyosub E Kim^{1

2

3

4}, Susanne M Morton^{5

2}

Affiliations

¹ Department of Physical Therapy, University of Delaware, Newark, Delaware 19713.
² Interdisciplinary Graduate Program in Biomechanics & Movement Science, University of Delaware, Newark, Delaware 19713.
³ Department of Psychological and Brain Sciences, University of Delaware, Newark, Delaware 19716.
⁴ School of Kinesiology, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada.
⁵ Department of Physical Therapy, University of Delaware, Newark, Delaware 19713 smmorton@udel.edu.

PMID: 38438263
PMCID: PMC10946027
DOI: 10.1523/ENEURO.0383-23.2024

Reinforcement Learning during Locomotion

Jonathan M Wood et al. eNeuro. 2024.

. 2024 Mar 15;11(3):ENEURO.0383-23.2024.

doi: 10.1523/ENEURO.0383-23.2024. Print 2024 Mar.

Authors

Jonathan M Wood^{1

2}, Hyosub E Kim^{1

2

3

4}, Susanne M Morton^{5

2}

Affiliations

¹ Department of Physical Therapy, University of Delaware, Newark, Delaware 19713.
² Interdisciplinary Graduate Program in Biomechanics & Movement Science, University of Delaware, Newark, Delaware 19713.
³ Department of Psychological and Brain Sciences, University of Delaware, Newark, Delaware 19716.
⁴ School of Kinesiology, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada.
⁵ Department of Physical Therapy, University of Delaware, Newark, Delaware 19713 smmorton@udel.edu.

PMID: 38438263
PMCID: PMC10946027
DOI: 10.1523/ENEURO.0383-23.2024

Abstract

When learning a new motor skill, people often must use trial and error to discover which movement is best. In the reinforcement learning framework, this concept is known as exploration and has been linked to increased movement variability in motor tasks. For locomotor tasks, however, increased variability decreases upright stability. As such, exploration during gait may jeopardize balance and safety, making reinforcement learning less effective. Therefore, we set out to determine if humans could acquire and retain a novel locomotor pattern using reinforcement learning alone. Young healthy male and female participants walked on a treadmill and were provided with binary reward feedback (indicated by a green checkmark on the screen) that was tied to a fixed monetary bonus, to learn a novel stepping pattern. We also recruited a comparison group who walked with the same novel stepping pattern but did so by correcting for target error, induced by providing real-time veridical visual feedback of steps and a target. In two experiments, we compared learning, motor variability, and two forms of motor memories between the groups. We found that individuals in the binary reward group did, in fact, acquire the new walking pattern by exploring (increasing motor variability). Additionally, while reinforcement learning did not increase implicit motor memories, it resulted in more accurate explicit motor memories compared with the target error group. Overall, these results demonstrate that humans can acquire new walking patterns with reinforcement learning and retain much of the learning over 24 h.

Keywords: gait; motor learning; motor memory; reinforcement learning; reward; variability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

**Figure 1.**
Experimental paradigm. A, All participants walked on a dual-belt treadmill, at a comfortable self-selected pace with a computer monitor in front of them. B, The RPE and TE groups received different feedback during the learning phase. The RPE group received only binary reward feedback, with a check mark and money added to a total when they performed a correct step length. The TE group received real-time feedback of their left step length related to the pink, horizontal target line. C, All participants walked in three different phases: (1) baseline, where individuals were asked to walk normally; (2) learning, where they gradually learned to walk with a longer left step length with the feedback and instructions depending on group assignment; (3) postlearning, where implicit aftereffects (experiment 1) or explicit retention (experiment 2) were probed, both without visual feedback. Participants in experiment 2 also were tested for explicit retention 24 h later (data not shown). Note that the target window in this figure was taken from a representative participant and is in terms of ΔLSL, but for all participants the target window was ±2 cm.

**Figure 2.**
Individual participant data. ΔLSL data for all strides of the learning phase from six exemplar subjects, three each from the TE and RPE groups. Participants in this figure were selected based on their magnitude of exploration (σ_ΔLSL values) during early learning (i.e., the first 50 strides after the target stopped moving). Specifically, we selected individuals who represented the 10th, 50^th, and 90th percentiles for each group separately according to our measure of early exploration. Blue dots represent unsuccessful steps, orange dots represent successful steps. The dashed lines represent the target window, centered on 10% ΔLSL (the width of the window was ±2 cm for all participants).

**Figure 3.**
Experiment 1 learning and implicit aftereffects. A, Group average ΔLSL data shown over all strides of experiment 1. Solid colored lines represent group means and shading represents 1 SEM. Gray regions denote the times when no feedback was provided and individuals were asked to “walk normally”. The dashed line represents the center of the target. Rectangular boxes denote the key timepoints of early learning, late learning, and initial and early washout. B, Group averaged ΔLSL at late learning for both groups. Thick horizontal lines represent group means; dots represent individuals; error bars represent ±1 SEM. C, Group average ΔLSL error at early and late learning. Thick lines represent group means; thin lines represent individuals. D, Group average percent success at early and late learning. E, Group average implicit aftereffects. Thick horizontal lines represent group means; dots represent individuals; error bars represent ±1 SEM.

**Figure 4.**
Experiment 1 exploration measurements. A, Exploration across the learning phase, measured as the baseline-normalized standard deviation of ΔLSL. For visualization purposes only, we calculated each individual's motor variability in 18 bins of 50 strides each across the learning phase, and then averaged motor variability among individuals in each group (solid lines) with the shading representing 1 SEM. The dashed rectangle represents the bins when the target was gradually shifting toward 10%. B, Early and late exploration. We calculated σ_ΔLSL at early and late learning timepoints. Thick lines represent group means; thin lines represent individuals. C, Motor variability measured as the standard deviation of trial-to-trial changes after successful and unsuccessful steps (σ_{trial-to-trial}). Thick lines represent group means; thin lines represent individuals.

**Figure 5.**
Experiment 2 explicit retention. A, Group average ΔLSL data shown over all strides of experiment 2. Solid colored lines represent group means and shading represents 1 SEM. Gray regions denote the times when no feedback was provided. During retention testing, participants were instructed to “walk like you did at the end of the previous phase.” The dashed line represents the center of the target during the learning phase. B, Group average ΔLSL percent error data for each stride during the immediate and 24 h retention timepoints (rectangles represent the 25 strides of each epoch). 0 represents perfect retention. The inset shows the group average (horizontal lines) and individual (dots) retention levels for the two timepoints. Error bars represent 1 SEM. C, Group average percent retention data for each stride during the immediate and 24 h retention timepoints. All shown in the same manner as in B. The dashed line at 100% represents perfect retention.

See this image and copyright information in PMC

References

1. Abe M, Schambra H, Wassermann EM, Luckenbaugh D, Schweighofer N, Cohen LG (2011) Reward improves long-term retention of a motor memory through induction of offline memory gains. Curr Biol 21:557–562. 10.1016/j.cub.2011.02.030 - DOI - PMC - PubMed
1. Bakkum A, Marigold DS (2022) Learning from the physical consequences of our actions improves motor memory. eNeuro 9:ENEURO.0459-21.2022. 10.1523/ENEURO.0459-21.2022 - DOI - PMC - PubMed
1. Bao S, Lei Y (2022) Memory decay and generalization following distinct motor learning mechanisms. J Neurophysiol 128:1534–1545. 10.1152/jn.00105.2022 - DOI - PubMed
1. Brach JS, Berlin JE, VanSwearingen JM, Newman AB, Studenski SA (2005) Too much or too little step width variability is associated with a fall history in older persons who walk at or near normal gait speed. J Neuroeng Rehabil 2:21. 10.1186/1743-0003-2-21 - DOI - PMC - PubMed
1. Branch F, Park E, Hegdé J (2022) Heuristic vetoing: top-down influences of the anchoring-and-adjustment heuristic can override the bottom-up information in visual images. Front Neurosci 16:745269. 10.3389/fnins.2022.745269 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

K12 HD055931/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reinforcement Learning during Locomotion

Affiliations

Reinforcement Learning during Locomotion

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials