Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Martin Stetter¹, Elmar W Lang²

Affiliations

¹ Department of Bioengineering Sciences, Weihenstephan-Triesdorf University of Applied Sciences, Freising D-85354, Germany.
² Computational Intelligence and Machine Learning Group, Department of Biophysics, University of Regensburg, Regensburg D-93053, Germany.

PMID: 34804145
PMCID: PMC8604601
DOI: 10.1155/2021/5590445

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Martin Stetter et al. Comput Intell Neurosci. 2021.

. 2021 Nov 12:2021:5590445.

doi: 10.1155/2021/5590445. eCollection 2021.

Authors

Martin Stetter¹, Elmar W Lang²

Affiliations

¹ Department of Bioengineering Sciences, Weihenstephan-Triesdorf University of Applied Sciences, Freising D-85354, Germany.
² Computational Intelligence and Machine Learning Group, Department of Biophysics, University of Regensburg, Regensburg D-93053, Germany.

PMID: 34804145
PMCID: PMC8604601
DOI: 10.1155/2021/5590445

Abstract

Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures. Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. The agent learns to represent the dynamical physical properties of its environment by intrinsically motivated exploration and performs inference on this representation to reach goals. For this, a set of self-organizing maps which represent state-action pairs is combined with a causal model for sequence prediction. The proposed system is evaluated in the cartpole environment. After an initial phase of playful exploration, the agent can execute kinematic simulations of the environment's future and use those for action planning. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest regarding the publication of this study.

Figures

**Figure 1**
(a) Proposed network architecture. A sensorimotor self-organizing map learns to represent state-action combinations, each represented by a state and action SOM, respectively. An activated state-action unit learns to predict the most likely next state, ${\hat{s}}_{t + 1}$ (brown), conditioned on the current state and action, s_t, a_t it represents. (b) Reduced architecture actually implemented for the demonstrations in the result section (for details, see text). 1D state and action representations are drawn for simplicity.

**Figure 2**
Directions of motion in the $θ - \dot{θ}$ phase plane (arrows) for five complete random episodes. Arrows are located at the states $(θ, \dot{θ})$ to which they apply. (a) Blue: real directions of motion in the next step as provided by the environment. Red: directions of motion predicted by the network when provided with the same state and action. Predictions approximate real movements very well. (b) Prediction of motions in phase space under virtual left push (blue) and virtual right push (red), respectively (for discussion see text).

**Figure 3**
(a) Screenshots of cartpole with identical start state followed by eight left pushes. Top: real time evolution. Bottom: 8-step prediction of time evolution. (b–e) Real (dashed) and predicted (solid) time evolutions of θ and x starting from identical initial states. (b) Time evolution of x and (c) time evolution of θ for the same simulation run. Blue: 8 left pushes; red: 8 right pushes. (d) Time evolution of θ for 5 different random action sequences and cartpole initializations. Traces with same colors correspond to the same simulation run. (e) Time evolution of θ for a longer sequence of length 39: oscillatory actions (3 left followed by alternating (6 x right) (6 x left) pushes). Major features of motion are correctly captured in all cases.

**Figure 4**
Time evolution of x (a, c) and θ (b, d) under the balancing task, two specific controlled-tilt tasks, and a tilted-balancing task. (a, b) Five exemplary traces per task. blue: balancing task; red: controlled tilt to the right with ${\dot{θ}}^{g} = 0.5$ ; and green: fast controlled tilt to the left with ${\dot{θ}}^{g} = - 5$ . (c, d) 10 exemplary traces for the tilted-balancing task with θ^g=0.15 rad.

**Figure 5**
(a) Performance analysis for the controlled-tilt task. Blue trace: means and standard deviations (vertical lines) over 20 runs of the actual final angular velocity as function of the goal angular velocity. Black dashed: bisector line. Red (left axis): mean action excess. Green (right axis): mean number of time steps until the done signal. (b) Performance analysis for the tilted-balancing task. Blue trace: means and standard deviations (vertical lines) over 20 runs of the actual average tilt as function of the goal tilt. The actual average tilt was calculated as the mean angle over time steps 50–100. Green (right axis): mean number of steps until the done signal.

See this image and copyright information in PMC

References

1. Goodfellow I. J., Bengio Y., Courville A. Deep Learning . Cambridge, MA, USA: MIT Press; 2016.
1. Lake B. M., Ullman T. D., Tenenbaum J. B., Gershman S. J. Building machines that learn and think like people. Behavioral and Brain Sciences . 2017;40:p. e253. doi: 10.1017/s0140525x16001837. - DOI - PubMed
1. Krizhevsky A., Sutskever I., Hinton G. E. ImageNet: classification with deep convolutional neural networks. In: Pereira F., Burges C. J. C., Bottou L., Weinberger K. Q., editors. Advances in Neural Information Processing Systems 25 . New York, NY, USA: Curran Associates, Inc.; 2012. pp. 1097–1105.
1. Graves A., Eck D., Beringer N., Schmidhuber J. Biologically plausible speech recognition with LSTM neural nets. In: Ijspeert A. J., Murata M., Wakamiya N., editors. Biologically Inspired Approaches to Advanced Information Technology . Berlin, Heidelberg: Springer Berlin Heidelberg; 2004. pp. 127–136. - DOI
1. Graves A., Mohamed Ar, Hinton G. E. Speech recognition with deep recurrent neural networks. 2013.

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Affiliations

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources