Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

Martin Biehl¹, Christian Guckelsberger², Christoph Salge^{3

4}, Simón C Smith^{4

5}, Daniel Polani⁴

Affiliations

¹ Araya Inc., Tokyo, Japan.
² Computational Creativity Group, Department of Computing, Goldsmiths, University of London, London, United Kingdom.
³ Game Innovation Lab, Department of Computer Science and Engineering, New York University, New York, NY, United States.
⁴ Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire, Hatfield, United Kingdom.
⁵ Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh, Edinburgh, United Kingdom.

PMID: 30214404
PMCID: PMC6125413
DOI: 10.3389/fnbot.2018.00045

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

Martin Biehl et al. Front Neurorobot. 2018.

. 2018 Aug 30:12:45.

doi: 10.3389/fnbot.2018.00045. eCollection 2018.

Authors

Martin Biehl¹, Christian Guckelsberger², Christoph Salge^{3

4}, Simón C Smith^{4

5}, Daniel Polani⁴

Affiliations

¹ Araya Inc., Tokyo, Japan.
² Computational Creativity Group, Department of Computing, Goldsmiths, University of London, London, United Kingdom.
³ Game Innovation Lab, Department of Computer Science and Engineering, New York University, New York, NY, United States.
⁴ Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire, Hatfield, United Kingdom.
⁵ Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh, Edinburgh, United Kingdom.

PMID: 30214404
PMCID: PMC6125413
DOI: 10.3389/fnbot.2018.00045

Abstract

Active inference is an ambitious theory that treats perception, inference, and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g., different environments or agent morphologies. In the literature, paradigms that share this independence have been summarized under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.

Keywords: active inference; empowerment; free energy principle; intrinsic motivation; perception-action loop; predictive information; universal reinforcement learning; variational inference.

PubMed Disclaimer

Figures

**Figure 1**
First two time steps of the Bayesian network representing the perception-action loop (PA-loop). All subsequent time steps are identical to the one from time t = 1 to t = 2.

**Figure 2**
Bayesian network of the generative model with parameters Θ = (Θ¹, Θ², Θ³) and hyperparameters Ξ = (Ξ¹, Ξ², Ξ³). Hatted variables are models / estimates of non-hatted counterparts in the perception-action loop in Figure 1. An edge that splits up connecting one node to n nodes (e.g., Θ² to Ê₁, Ê₂, …) corresponds to n edges from that node to all the targets under the usual Bayesian network convention. Note that in contrast to the perception-action loop in Figure 1, imagined actions Â_t have no parents. They are either set to past values or, for those in the future, a probability distribution over them must be assumed.

**Figure 3**
Internal generative model with plugged in data up to t = 2 with Ŝ₀ = s₀, Ŝ₁ = s₁ and Â₁ = a₁ as well as from now on fixed hyperparameters ξ = (ξ¹, ξ², ξ³). Conditioning on the plugged in data leads to the posterior distribution $q (ŝ_{t : \hat{T}}, ê_{0 : \hat{T}}, â_{t : \hat{T}}, θ | s a_{≺ t}, ξ)$ . Predictions for future sensor values can be obtained by marginalising out other random variables e.g., to predict Ŝ₂ we would like to get q(ŝ₂|s₀, s₁, a₁, ξ). Note however that this requires an assumption for the probability distribution over Â₂.

**Figure 4**
Bayesian network of the approximate posterior factor at t = 2. The variational parameters Φ¹, Φ², Φ³, and $Φ^{E_{≺ t}} = (Φ^{E_{0}}, Φ^{E_{1}})$ are positioned so as to indicate what dependencies and nodes they replace in the generative model in Figure 2.

**Figure 5**
Bayesian network of the approximate complete posterior of Equation (40) at t = 2 for the future actions $â_{t : \hat{T}}$ . Only $Ê_{t - 1}, Θ^{1}, Θ^{2}$ and the future action $â_{t : \hat{T}}$ appear in the predictive factor and influence future variables. In general there is one approximate complete posterior for each possible sequence $â_{t : \hat{T}}$ of future actions.

**Figure 6**
Generative model including $q (â_{t : \hat{T}} | s a_{≺ t}, ξ)$ at t = 2 with ŜÂ_≺2 influencing future actions $Â_{2 : \hat{T}}$ . Note that, only future actions are dependent on past sensor values and actions, e.g., action Â₁ has no incoming edges. The increased gap between time step t = 1 and t = 2 is to indicate that this time step is special in the model. For each time step t there is an according model with the particular relation between past ŜÂ_≺t and $Â_{t : \hat{T}}$ shifted accordingly.

See this image and copyright information in PMC

References

1. Allen M., Friston K. J. (2016). From cognitivism to autopoiesis: towards a computational framework for the embodied mind. Synthese 195, 2459–2482. 10.1007/s11229-016-1288-5 - DOI - PMC - PubMed
1. Aslanides J., Leike J., Hutter M. (2017). Universal reinforcement learning algorithms: survey and experiments, in Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, VIC: ), 1403–1410.
1. Attias H. (1999). A variational Bayesian framework for graphical models, in Proceedings Advances in Neural Information Processing Systems 12, eds Solla S., Leen T., Müller K. (Cambridge, MA: MIT Press; ), 209–215.
1. Attias H. (2003). Planning by probabilistic inference, in Proceedings 9th International Workshop on Artificial Intelligence and Statistics (Key West, FL: ).
1. Ay N., Bernigau H., Der R., Prokopenko M. (2012). Information-driven self-organization: the dynamical system approach to autonomous robot behavior. Theor. Biosci. 131, 161–179. 10.1007/s12064-011-0137-9 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

Affiliations

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources