Review

. 2020 Dec:99:102447.

doi: 10.1016/j.jmp.2020.102447.

Active inference on discrete state-spaces: A synthesis

Lancelot Da Costa^{1

2}, Thomas Parr², Noor Sajid², Sebastijan Veselic², Victorita Neacsu², Karl Friston²

Affiliations

¹ Department of Mathematics, Imperial College London, London, SW7 2RH, United Kingdom.
² Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3AR, United Kingdom.

PMID: 33343039
PMCID: PMC7732703
DOI: 10.1016/j.jmp.2020.102447

Review

Active inference on discrete state-spaces: A synthesis

Lancelot Da Costa et al. J Math Psychol. 2020 Dec.

. 2020 Dec:99:102447.

doi: 10.1016/j.jmp.2020.102447.

Authors

Lancelot Da Costa^{1

2}, Thomas Parr², Noor Sajid², Sebastijan Veselic², Victorita Neacsu², Karl Friston²

Affiliations

¹ Department of Mathematics, Imperial College London, London, SW7 2RH, United Kingdom.
² Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3AR, United Kingdom.

PMID: 33343039
PMCID: PMC7732703
DOI: 10.1016/j.jmp.2020.102447

Abstract

Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.

Keywords: Active inference; Free energy principle; Markov decision process; Mathematical review; Process theory; Variational Bayesian inference.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
Markov blankets in active inference. This figure illustrates the Markov blanket assumption of active inference. A Markov blanket is a set of variables through which states internal and external to the system interact. Specifically, the system must be such that we can partition it into a Bayesian network of internal states $μ$ , external states $η$ , sensory states $o$ and active states $u$ , ( $μ$ , $o$ and $u$ are often referred together as *particular states*) with probabilistic (causal) links in the directions specified by the arrows. All interactions between internal and external states are therefore mediated by the blanket states $b$ . The sensory states represent the sensory information that the body receives from the environment and the active states express how the body influences the environment. This blanket assumption is quite generic, in that it can be reasonably assumed for a brain as well as elementary organisms. For example, when considering a bacillus, the sensory states become the cell membrane and the active states comprise the actin filaments of the cytoskeleton. Under the Markov blanket assumption – together with the assumption that the system persists over time (i.e., possesses a non-equilibrium steady state) – a generalised synchrony appears, such that the dynamics of the internal states can be cast as performing inference over the external states (and vice versa) via a minimisation of variational free energy (Friston, 2019, Parr et al., 2020). This coincides with existing approaches to inference; i.e., variational Bayes (Beal, 2003, Bishop, 2006, Blei et al., 2017, Jordan et al., 1998). This can be viewed as the internal states mirroring external states, via sensory states (e.g., perception), and external states mirroring internal states via active states (e.g., a generalised form of self-assembly, autopoiesis or niche construction). Furthermore, under these assumptions the most likely courses of actions can be shown to minimise expected free energy. Note that external states beyond the system should not be confused with the hidden states of the agent’s generative model (which model external states). In fact, the internal states are exactly the parameters (i.e., sufficient statistics) encoding beliefs about hidden states and other latent variables, which model external states in a process of variational free energy minimisation. Hidden and external states may or may not be isomorphic. In other words, an agent uses its internal states to represent hidden states that may or may not exist in the external world.

**Fig. 2**
Example of a discrete state-space generative model. Panel 2a, specifies the form of the generative model, which is how the agent represents the world. The generative model is a joint probability distribution over (hidden) states, outcomes and other variables that cause outcomes. In this representation, states unfold in time causing an observation at each time-step. The likelihood matrix $A$ encodes the probabilities of state–outcome pairs. The policy $π$ specifies which action to perform at each time-step. Note that the agent’s preferences may be specified either in terms of states or outcomes. It is important to distinguish between states (resp. outcomes) that are random variables, and the possible values that they can take in $S$ (resp. in $O$ ), which we refer to as possible states (resp. possible outcomes). Note that this type of representation comprises a finite number of timesteps, actions, policies, states, outcomes, possible states and possible outcomes. In Panel 2b, the generative model is displayed as a probabilistic graphical model (Bishop, 2006, Jordan et al., 1998, Pearl, 1988, Pearl, 1998) expressed in factor graph form (Loeliger, 2004). The variables in circles are random variables, while squares represent factors, whose specific form are given in Panel 2a. The arrows represent causal relationships (i.e., conditional probability distributions). The variables highlighted in grey can be observed by the agent, while the remaining variables are inferred through approximate Bayesian inference (see Section 4) and called hidden or latent variables. Active inference agents perform inference by optimising the parameters of an approximate posterior distribution (see Section 4). Panel 2c specifies how this approximate posterior factorises under a particular mean-field approximation (Tanaka, 1999), although other factorisations may be used (Parr, Markovic et al., 2019, Schwöbel et al., 2018). A glossary of terms used in this figure is available in Table 2. The mathematical yoga of generative models is heavily dependent on Markov blankets. The Markov blanket of a random variable in a probabilistic graphical model are those variables that share a common factor. Crucially, a variable conditioned upon its Markov blanket is conditionally independent of all other variables. We will use this property extensively (and implicitly) in the text.

**Fig. 3**
Markov blankets and self-evidencing. This schematic illustrates the various interpretations of minimising variational free energy. Recall that the existence of a Markov blanket implies a certain lack of influences among internal, blanket and external states. These independencies have an important consequence; internal and active states are the only states that are not influenced by external states, which means their dynamics (i.e., perception and action) are a function of, and only of, particular states (i.e., internal, sensory and active states); here, the variational (free energy) bound on surprise. This surprise has a number of interesting interpretations. Given it is the negative log probability of finding a particle or creature in a particular state, minimising surprise corresponds to maximising the value of a particle’s state. This interpretation is licensed by the fact that the states with a high probability are, by definition, attracting states. On this view, one can then spin-off an interpretation in terms of reinforcement learning (Barto & Sutton, 1992), optimal control theory (Todorov & Jordan, 2002) and, in economics, expected utility theory (Bossaerts & Murawski, 2015). Indeed, any scheme predicated on the optimisation of some objective function can now be cast in terms of minimising surprise – in terms of perception and action (i.e., the dynamics of internal and active states) – by specifying these optimal values to be the agent’s preferences. The minimisation of surprise (i.e., self-information) leads to a series of influential accounts of neuronal dynamics; including the principle of maximum mutual information (Linsker, 1990, Optican and Richmond, 1987), the principles of minimum redundancy and maximum efficiency (Barlow, 1961) and the free energy principle (Friston et al., 2006). Crucially, the average or expected surprise (over time or particular states of being) corresponds to entropy. This means that action and perception look as if they are minimising entropy. This leads us to theories of self-organisation, such as synergetics in physics (Haken, 1978, Kauffman, 1993, Nicolis and Prigogine, 1977) or homeostasis in physiology (Ashby, 1947, Bernard, 1974, Conant and Ashby, 1970). Finally, the probability of any blanket states given a Markov blanket ( $m$ ) is, on a statistical view, model evidence (MacKay, 1995, MacKay, 2003). This means that all the above formulations are internally consistent with things like the Bayesian brain hypothesis, evidence accumulation and predictive coding; most of which inherit from Helmholtz motion of unconscious inference (von Helmholtz & Southall, 1962), later unpacked in terms of perception as hypothesis testing in 20th century psychology (Gregory, 1980) and machine learning (Dayan et al., 1995).

**Fig. 4**
Expected free energy. This figure illustrates the various ways in which minimising expected free energy can be unpacked (omitting model parameters for clarity). The upper panel casts action and perception as the minimisation of variational and expected free energy, respectively. Crucially, active inference introduces beliefs over policies that enable a formal description of planning as inference (Attias, 2003, Botvinick and Toussaint, 2012, Kaplan and Friston, 2018a). In brief, posterior beliefs about hidden states of the world, under plausible policies, are optimised by minimising a variational (free energy) bound on log evidence. These beliefs are then used to evaluate the expected free energy of allowable policies, from which actions can be selected (Friston, FitzGerald et al., 2017). Crucially, expected free energy subsumes several special cases that predominate in the psychological, machine learning and economics literature. These special cases are disclosed when one removes particular sources of uncertainty from the implicit optimisation problem. For example, if we ignore prior preferences, then the expected free energy reduces to information gain (Lindley, 1956, MacKay, 2003) or intrinsic motivation (Barto et al., 2013, Deci and Ryan, 1985, Oudeyer and Kaplan, 2009). This is mathematically the same as expected Bayesian surprise and mutual information that underwrite salience in visual search (Itti and Baldi, 2009, Sun et al., 2011) and the organisation of our visual apparatus (Barlow, 1961, Barlow, 1974, Linsker, 1990, Optican and Richmond, 1987). If we now remove risk but reinstate prior preferences, one can effectively treat hidden and observed (sensory) states as isomorphic. This leads to risk sensitive policies in economics (Fleming and Sheu, 2002, Kahneman and Tversky, 1988) or KL control in engineering (van den Broek et al., 2010). Here, minimising risk corresponds to aligning predicted outcomes to preferred outcomes. If we then remove ambiguity and relative risk of action (i.e., intrinsic value), we are left with extrinsic value or expected utility in economics (Von Neumann & Morgenstern, 1944) that underwrites reinforcement learning and behavioural psychology (Barto & Sutton, 1992). Bayesian formulations of maximising expected utility under uncertainty is also known as Bayesian decision theory (Berger, 1985). Finally, if we just consider a completely unambiguous world with uninformative priors, expected free energy reduces to the negative entropy of posterior beliefs about the causes of data; in accord with the maximum entropy principle (Jaynes, 1957). The expressions for variational and expected free energy correspond to those described in the main text (omitting model parameters for clarity). They are arranged to illustrate the relationship between complexity and accuracy, which become risk and ambiguity, when considering the consequences of action. This means that risk-sensitive policy selection minimises expected complexity or computational cost. The coloured dots above the terms in the equations correspond to the terms that constitute the special cases in the lower panels.

**Fig. 5**
Possible functional anatomy. This figure summarises a possible (coarse-grained) functional anatomy that could implement belief updating in active inference. The arrows correspond to message passing between different neuronal populations. Here, a visual observation is sampled by the retina, aggregated in first-order sensory thalamic nuclei and processed in the occipital (visual) cortex. The green arrows correspond to message passing of sensory information. This signal is then propagated (via the ventral visual pathway) to inferior and medial temporal lobe structures such as the hippocampus; this allows the agent to go from observed outcomes to beliefs about their most likely causes in state-estimation (perception), which is performed locally. The variational free energy is computed in the striatum. The orange arrows encode message passing of beliefs. Preferences $C$ are attributed to the dorsolateral prefrontal cortex – which is thought to encode representations over prolonged temporal scales (Parr & Friston, 2017b) – consistent with the fact that these are likely to be encoded within higher cortical areas (Friston, Lin et al., 2017). The expected free energy is computed in the medial prefrontal cortex (Friston, FitzGerald et al., 2017) during planning, which leads to inferences about most plausible policies (decision-making) in the basal ganglia, consistent with the fact that the basal ganglia is thought to underwrite planning and decision-making (Berns and Sejnowski, 1996, Ding and Gold, 2013, Haber, 2003, Jahanshahi et al., 2015, Parr and Friston, 2018b, Thibaut, 2016). The message concerning policy selection is sent to the motor cortex via thalamocortical loops. The most plausible action, which is selected in the motor cortex is passed on through the spinal cord to trigger a limb movement. Simultaneously, policy independent state-estimation is performed in the ventrolateral prefrontal cortex, which leads to synaptic plasticity dynamics in the prefrontal cortex, where the synaptic weights encode beliefs about $A$ .

See this image and copyright information in PMC

References

1. Ackley D.H., Hinton G.E., Sejnowski T.J. A learning algorithm for Boltzmann machines. Cognitive Science. 1985;9(1):147–169. doi: 10.1016/S0364-0213(85)80012-4. - DOI
1. Adams R.A., Stephan K.E., Brown H.R., Frith C.D., Friston K.J. The computational anatomy of psychosis. Frontiers in Psychiatry. 2013;4 doi: 10.3389/fpsyt.2013.00047. - DOI - PMC - PubMed
1. Aitchison L., Lengyel M. With or without you: Predictive coding and Bayesian inference in the brain. Current Opinion in Neurobiology. 2017;46:219–227. doi: 10.1016/j.conb.2017.08.010. - DOI - PMC - PubMed
1. Allenby G.M., Rossi P.E., McCulloch R.E. Hierarchical Bayes models: A practitioners guide. Journal of Bayesian Applications in Marketing. 2005
1. Ashby W.R. Principles of the self-organizing dynamic system. The Journal of General Psychology. 1947;37(2):125–128. doi: 10.1080/00221309.1947.9918144. - DOI - PubMed

Publication types

Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Active inference on discrete state-spaces: A synthesis

Affiliations

Active inference on discrete state-spaces: A synthesis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous