Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Johannes Günther^{1

2}, Nadia M Ady¹, Alex Kearney¹, Michael R Dawson^{2

3}, Patrick M Pilarski^{1

2

3}

Affiliations

¹ Department of Computing Science, University of Alberta, Edmonton, AB, Canada.
² Alberta Machine Intelligence Institute, Edmonton, AB, Canada.
³ Department of Medicine, University of Alberta, Edmonton, AB, Canada.

PMID: 33501202
PMCID: PMC7805647
DOI: 10.3389/frobt.2020.00034

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Johannes Günther et al. Front Robot AI. 2020.

. 2020 Mar 13:7:34.

doi: 10.3389/frobt.2020.00034. eCollection 2020.

Authors

Johannes Günther^{1

2}, Nadia M Ady¹, Alex Kearney¹, Michael R Dawson^{2

3}, Patrick M Pilarski^{1

2

3}

Affiliations

¹ Department of Computing Science, University of Alberta, Edmonton, AB, Canada.
² Alberta Machine Intelligence Institute, Edmonton, AB, Canada.
³ Department of Medicine, University of Alberta, Edmonton, AB, Canada.

PMID: 33501202
PMCID: PMC7805647
DOI: 10.3389/frobt.2020.00034

Abstract

Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well-suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rates or step sizes). Typically, these parameters are chosen based on an extensive parameter search-an approach that neither scales well nor is well-suited for tasks that require changing step sizes due to non-stationarity. To begin to address this challenge, we examine the use of online step-size adaptation using the Modular Prosthetic Limb: a sensor-rich robotic arm intended for use by persons with amputations. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. As a first contribution, we show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream, but TD only achieves comparable performance with a carefully hand-tuned learning rate, while TIDBD uses a robust meta-parameter and tunes its own learning rates. Secondly, our results show that for this particular application TIDBD allows the system to automatically detect patterns characteristic of sensor failures common to a number of robotic applications. As a third contribution, we investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

Keywords: continual learning; long-term autonomy; prediction; reinforcement learning; robot learning.

PubMed Disclaimer

Figures

**Figure 1**
The Modular Prosthetic Limb (MPL), a robot arm with many degrees of freedom and sensors used for the experiments in this work.

**Figure 2**
Decoded percept data from the robot over the 30 min of the experiment. The periods of the arm resting and the periods of the arm moving are clearly distinguishable for the position, velocity and load sensors. The values of the temperature sensors increase over the experiment, with additional increases during the periods of movement.

**Figure 3**
RMSE and violin plots for the experiment in section 4.1. The top pane shows the RMSE for both classic TD and TIDBD for each of the different time periods. The middle and bottom panes show violin plots for the RMSE, for TIDBD and classic TD, respectively. All results are the average over 30 independent runs.

**Figure 4**
Step-size development over the course of the experiment. As TIDBD adapts the step sizes, this distribution will change. Subplot **(A)** shows the step sizes at initialization. Subplot **(B)** shows the step-size distribution after the first movement phase. Subplot **(C)** shows the step-size distribution after the second movement phase. Subplot **(D)** shows the step-size distribution at the end of the experiment.

**Figure 5**
Step sizes distribution for the four elbow sensors **(A)** and the remaining 104 sensors **(B)**, when the four elbow sensors are stuck. As described in subsection 4.2, the step sizes increase noticeably compared to the original experiment. The biggest step sizes are two times as big.

**Figure 6**
Step sizes distribution for the four elbow sensors **(A)** and the remaining 104 sensors **(B)**, when the four elbow sensors are broken. The step sizes for the four broken sensors are noticeably reduced when compared to the experiment without broken sensors.

**Figure 7**
**(A)** Accumulated RMSE over the experiment, depending on the initial step size. The first plot shows the overall accumulated error over the whole range of tested step sizes for TD and TIDBD with different meta step sizes θ. While the performance of TD dramatically worsens for small step sizes, TIDBD exhibits more consistent and better behavior for different meta step sizes. Subplot **(B)** zooms in on larger step sizes to highlight the typical bowl-shaped performance line for TD. While the error for TD is slightly smaller with carefully tuned step sizes, TIDBD shows more robust performance with respect to the initial step sizes and the meta step sizes.

See this image and copyright information in PMC

References

1. Bridges M. M., Para M. P., Mashner M. J. (2011). Control system architecture for the modular prosthetic limb. Johns Hopkins APL Tech. Digest. 30, 217–222. Available online at: https://www.jhuapl.edu/Content/techdigest/pdf/V30-N03/30-3-Bridges.pdf.
1. Dalrymple A. N., Roszko D. A., Sutton R. S., Mushahwar V. K. (2019). Pavlovian control of intraspinal microstimulation to produce over-ground walking. bioRxiv[preprint]. bioRxiv:785741. 10.1101/785741 - DOI - PubMed
1. Drescher G. L. (1991). Made-Up Minds: A Constructivist Approach to Artificial Intelligence. Cambridge, MA: MIT Press.
1. Edwards A. L., Dawson M. R., Hebert J. S., Sherstan C., Sutton R. S., Chan K. M., et al. . (2016). Application of real-time machine learning to myoelectric prosthesis control: a case series in adaptive switching. Prosthet. Orthot. Int. 40, 573–581. 10.1177/0309364615605373 - DOI - PubMed
1. Günther J. (2018). Machine intelligence for adaptable closed loop and open loop production engineering systems (Ph.D. thesis), Technische Universität München, München, Germany.

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Affiliations

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials