Canonical neural networks perform active inference

Takuya Isomura¹, Hideaki Shimazaki², Karl J Friston³

Affiliations

¹ Brain Intelligence Theory Unit, RIKEN Center for Brain Science, Wako, Saitama, 351-0198, Japan. takuya.isomura@riken.jp.
² Center for Human Nature, Artificial Intelligence, and Neuroscience (CHAIN), Hokkaido University, Sapporo, Hokkaido, 060-0812, Japan.
³ Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London, WC1N 3AR, UK.

PMID: 35031656
PMCID: PMC8760273
DOI: 10.1038/s42003-021-02994-2

Canonical neural networks perform active inference

Takuya Isomura et al. Commun Biol. 2022.

. 2022 Jan 14;5(1):55.

doi: 10.1038/s42003-021-02994-2.

Authors

Takuya Isomura¹, Hideaki Shimazaki², Karl J Friston³

Affiliations

¹ Brain Intelligence Theory Unit, RIKEN Center for Brain Science, Wako, Saitama, 351-0198, Japan. takuya.isomura@riken.jp.
² Center for Human Nature, Artificial Intelligence, and Neuroscience (CHAIN), Hokkaido University, Sapporo, Hokkaido, 060-0812, Japan.
³ Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London, WC1N 3AR, UK.

PMID: 35031656
PMCID: PMC8760273
DOI: 10.1038/s42003-021-02994-2

Abstract

This work considers a class of canonical neural networks comprising rate coding models, wherein neural activity and plasticity minimise a common cost function-and plasticity is modulated with a certain delay. We show that such neural networks implicitly perform active inference and learning to minimise the risk associated with future outcomes. Mathematical analyses demonstrate that this biological optimisation can be cast as maximisation of model evidence, or equivalently minimisation of variational free energy, under the well-known form of a partially observed Markov decision process model. This equivalence indicates that the delayed modulation of Hebbian plasticity-accompanied with adaptation of firing thresholds-is a sufficient neuronal substrate to attain Bayes optimal inference and control. We corroborated this proposition using numerical analyses of maze tasks. This theory offers a universal characterisation of canonical neural networks in terms of Bayesian belief updating and provides insight into the neuronal mechanisms underlying planning and adaptive behavioural control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Schematic of an external milieu and neural network, and the corresponding Bayesian formation.**
a Interaction between the external milieu and autonomous system comprising a two-layer neural network. On receiving sensory inputs or observations $o (t)$ that are generated from hidden states $s (t)$ , the network activity $x (t)$ generates outputs $y (t)$ . The gradient descent on a neural network cost function L determines the dynamics of neural activity and plasticity. Thus, L is sufficient to characterise the neural network. The proposed theory affirms that the ensuing neural dynamics are self-organised to encode the posterior beliefs about hidden states and decisions. b Corresponding variational Bayesian formation. The interaction depicted in a is formulated in terms of a POMDP model, which is parameterised by $A, B, C \in θ$ and $D, E \in λ$ . Variational free energy minimisation allows an agent to self-organise to encode the hidden states of the external milieu—and to make decisions minimising future risk. Here, variational free energy F is sufficient to characterise the inferences and behaviours of the agent.

**Fig. 2. Factor graph depicting a fictive causality of factors that the generative model hypothesises.**
The POMDP model is expressed as a Forney factor graph^, based upon the formulation in ref. . The arrows from the present risk $γ_{t}$ —sampled from $Γ_{t}$ —to past decisions $δ_{τ}$ optimise the policy in a post hoc manner, to minimise future risk. In reality, the current error $γ_{t}$ is determined based on past decisions (top). In contrast, decision making to minimise the future risk implies a fictive causality from $γ_{t}$ to $δ_{τ}$ (bottom). Inference and learning correspond to the inversion of this generative model. Postdiction of past decisions is formulated as the learning of the policy mapping, conditioned by $γ_{t}$ . Here, A, B and C indicate matrices of the conditional probability, and bold case variables are the corresponding posterior beliefs. Moreover, $D^{*}$ and $E^{*}$ indicate the true prior beliefs about hidden states and decisions, while D and E indicate the priors that the network operates under. When and only when $D = D^{*}$ and $E = E^{*}$ , inferences and behaviours are optimal for a given task or set of environmental contingencies, and are biased otherwise.

**Fig. 3. Mathematical equivalence between variational free energy and neural network cost functions, depicted by one-to-one correspondence of their components.**
Top: variational free energy transformed from Eq. (5) using the Bayes theorem. Here, $B^{†} = B^{T} diag {[D]}^{- 1}$ and $C^{†} = C^{T} diag {[E]}^{- 1}$ indicate the inverse mappings, and D and E are the state and decision priors. Bottom: neural network cost function that is a counterpart to the aforementioned variational free energy. In this equation, ${\hat{W}}_{l} : = sig (W_{l})$ , ${\hat{K}}_{l} : = sig (K_{l})$ , and ${\hat{V}}_{l} : = sig (V_{l})$ (for $l = 0, 1$ ) indicate the sigmoid functions of synaptic strengths. Moreover, $ϕ_{l}$ and $ψ_{l}$ are perturbation terms that characterise the bias in firing thresholds. Here, $ϕ_{l} : = ϕ_{l} (W_{l}, K_{l}) = h_{l} - \ln \bar{{\hat{W}}_{l}} \vec{1} - \ln \bar{{\hat{K}}_{l}} \vec{1}$ is a function of $W_{l}$ and $K_{l}$ , while $ψ_{l} : = ψ_{l} (V_{l}) = m_{l} - \ln \bar{{\hat{V}}_{l}} \vec{1}$ is a function of $V_{l}$ . When ${\hat{ω}}_{i} : = sig (ω_{i})$ is the sigmoid function of $ω_{i}$ , $ω_{i} \equiv \ln {\hat{ω}}_{i} - \ln \bar{{\hat{ω}}_{i}}$ holds for an arbitrary $ω_{i}$ . Using this relationship, Eq. (7) is transformed into the form presented at the bottom of this figure. This form of cost functions formally corresponds to variational free energy expressed on the top of this figure. Blue lines show one-to-one correspondence of their components.

**Fig. 4. Simulations of neural networks solving maze tasks.**
a Neural network architecture. The agent receives the states (pathway or wall) of the neighbouring 11 × 11 cells as sensory inputs. A decision here represents a four-step sequence of actions (selected from up, down, left or right), resulting in 256 options in total. The panels on the right depict observations and posterior beliefs about hidden states and decisions. b General view of the maze. The maze comprises a discrete state space, wherein white and black cells indicate pathways and walls, respectively. A thick blue cell indicates the current position of the agent, while the thin blue line is its trajectory. Starting from the left, the agent needs to reach the right edge of the maze within $T = 2 \times 10^{4}$ time steps. c Trajectories of the agent’s x-axis position in sessions before (black, session 1) and after (blue, session 100) training. d Duration to reach the goal when the neural network operates under uniform decision priors $E_{right} = E_{left} = E_{up} = E_{down} = 1 / 256 \approx 0.0039$ (where $E_{right}$ indicates the prior probability to select a decision involving the rightward motion in the next step). Blue and red circles indicate succeeded and failed sessions, respectively. e Failure probability (left) and duration to reach the goal (right) when the neural network operates under three different prior conditions $E_{right} = 0.0023, 0.0039, 0.0055$ (black, blue and cyan, respectively), where $E_{left} = 0.0078 - E_{right}$ and $E_{up} = E_{down} = 0.0039$ hold. The line indicates the average of ten successive sessions. Although the neural network with $E_{right} = 0.0055$ exhibits better performance in the early stage, it turns out to overestimate a preference of the rightward motion in later stages, even when it approaches the wall. e was obtained with 20 distinct, randomly generated mazes. Shaded areas indicate the standard error. Refer to Methods section ‘Simulations’ for further details.

**Fig. 5. Estimation of implicit priors enables the prediction of subsequent learning.**
a Estimation of implicit prior $E_{right}$ —encoded by threshold factor $ψ$ —under three different prior conditions (black, blue and cyan; c.f., Fig. 4). Here, $ψ$ was estimated through Bayesian inference based on sequences of neural activity, obtained with ten distinct mazes. Then, $E_{right}$ was computed by $\ln E_{1} = ψ_{1}$ for each of 64 elements. The other 192 elements of E₁ (i.e. $E_{left}, E_{up}, E_{down}$ ) were also estimated. The sum of all the elements of E₁ was normalised to 1. b Prediction of the learning process within previously unexperienced, randomly generated mazes. Using the estimated $E$ , we reconstructed the computational architecture (i.e. neural network) of the agent. Then, we simulated the adaptation process of the agent’s behaviour using the reconstructed neural network and computed the trajectory of the probability of failure to reach the goal within $T = 2 \times 10^{4}$ time steps. The resulting learning trajectories (solid lines) predict the learning trajectories of the original agent (dashed lines) under three different prior conditions, in the absence of observed neural responses and behaviours. Lines and shaded areas indicate the mean and standard error, respectively. Inset panels depict comparisons between the failure probability of the original and reconstructed agent after learning (average over session 51–100), within ten previously unexperienced mazes. Refer to Methods section ‘Data analysis’ for further details.

See this image and copyright information in PMC

References

1. Linsker R. Self-organization in a perceptual network. Computer. 1988;21:105–117.
1. Dayan P, Hinton GE, Neal RM, Zemel RS. The Helmholtz machine. Neural Comput. 1995;7:889–904. - PubMed
1. Sutton, R. S. & Barto, A. G. Reinforcement Learning (MIT Press, 1998).
1. Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).
1. Friston KJ, Kilner J, Harrison L. A free energy principle for the brain. J. Physiol. Paris. 2006;100:70–87. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Canonical neural networks perform active inference

Affiliations

Canonical neural networks perform active inference

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous