Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 12:2021:5590445.
doi: 10.1155/2021/5590445. eCollection 2021.

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Affiliations

Learning Intuitive Physics and One-Shot Imitation Using State-Action-Prediction Self-Organizing Maps

Martin Stetter et al. Comput Intell Neurosci. .

Abstract

Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures. Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. The agent learns to represent the dynamical physical properties of its environment by intrinsically motivated exploration and performs inference on this representation to reach goals. For this, a set of self-organizing maps which represent state-action pairs is combined with a causal model for sequence prediction. The proposed system is evaluated in the cartpole environment. After an initial phase of playful exploration, the agent can execute kinematic simulations of the environment's future and use those for action planning. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest regarding the publication of this study.

Figures

Figure 1
Figure 1
(a) Proposed network architecture. A sensorimotor self-organizing map learns to represent state-action combinations, each represented by a state and action SOM, respectively. An activated state-action unit learns to predict the most likely next state, s^t+1 (brown), conditioned on the current state and action, st, at it represents. (b) Reduced architecture actually implemented for the demonstrations in the result section (for details, see text). 1D state and action representations are drawn for simplicity.
Figure 2
Figure 2
Directions of motion in the θθ˙ phase plane (arrows) for five complete random episodes. Arrows are located at the states θ,θ˙ to which they apply. (a) Blue: real directions of motion in the next step as provided by the environment. Red: directions of motion predicted by the network when provided with the same state and action. Predictions approximate real movements very well. (b) Prediction of motions in phase space under virtual left push (blue) and virtual right push (red), respectively (for discussion see text).
Figure 3
Figure 3
(a) Screenshots of cartpole with identical start state followed by eight left pushes. Top: real time evolution. Bottom: 8-step prediction of time evolution. (b–e) Real (dashed) and predicted (solid) time evolutions of θ and x starting from identical initial states. (b) Time evolution of x and (c) time evolution of θ for the same simulation run. Blue: 8 left pushes; red: 8 right pushes. (d) Time evolution of θ for 5 different random action sequences and cartpole initializations. Traces with same colors correspond to the same simulation run. (e) Time evolution of θ for a longer sequence of length 39: oscillatory actions (3 left followed by alternating (6 x right) (6 x left) pushes). Major features of motion are correctly captured in all cases.
Figure 4
Figure 4
Time evolution of x (a, c) and θ (b, d) under the balancing task, two specific controlled-tilt tasks, and a tilted-balancing task. (a, b) Five exemplary traces per task. blue: balancing task; red: controlled tilt to the right with θ˙g=0.5; and green: fast controlled tilt to the left with θ˙g=5. (c, d) 10 exemplary traces for the tilted-balancing task with θg=0.15 rad.
Figure 5
Figure 5
(a) Performance analysis for the controlled-tilt task. Blue trace: means and standard deviations (vertical lines) over 20 runs of the actual final angular velocity as function of the goal angular velocity. Black dashed: bisector line. Red (left axis): mean action excess. Green (right axis): mean number of time steps until the done signal. (b) Performance analysis for the tilted-balancing task. Blue trace: means and standard deviations (vertical lines) over 20 runs of the actual average tilt as function of the goal tilt. The actual average tilt was calculated as the mean angle over time steps 50–100. Green (right axis): mean number of steps until the done signal.

Similar articles

References

    1. Goodfellow I. J., Bengio Y., Courville A. Deep Learning . Cambridge, MA, USA: MIT Press; 2016.
    1. Lake B. M., Ullman T. D., Tenenbaum J. B., Gershman S. J. Building machines that learn and think like people. Behavioral and Brain Sciences . 2017;40:p. e253. doi: 10.1017/s0140525x16001837. - DOI - PubMed
    1. Krizhevsky A., Sutskever I., Hinton G. E. ImageNet: classification with deep convolutional neural networks. In: Pereira F., Burges C. J. C., Bottou L., Weinberger K. Q., editors. Advances in Neural Information Processing Systems 25 . New York, NY, USA: Curran Associates, Inc.; 2012. pp. 1097–1105.
    1. Graves A., Eck D., Beringer N., Schmidhuber J. Biologically plausible speech recognition with LSTM neural nets. In: Ijspeert A. J., Murata M., Wakamiya N., editors. Biologically Inspired Approaches to Advanced Information Technology . Berlin, Heidelberg: Springer Berlin Heidelberg; 2004. pp. 127–136. - DOI
    1. Graves A., Mohamed Ar, Hinton G. E. Speech recognition with deep recurrent neural networks. 2013.

LinkOut - more resources