Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 14;5(1):55.
doi: 10.1038/s42003-021-02994-2.

Canonical neural networks perform active inference

Affiliations

Canonical neural networks perform active inference

Takuya Isomura et al. Commun Biol. .

Abstract

This work considers a class of canonical neural networks comprising rate coding models, wherein neural activity and plasticity minimise a common cost function-and plasticity is modulated with a certain delay. We show that such neural networks implicitly perform active inference and learning to minimise the risk associated with future outcomes. Mathematical analyses demonstrate that this biological optimisation can be cast as maximisation of model evidence, or equivalently minimisation of variational free energy, under the well-known form of a partially observed Markov decision process model. This equivalence indicates that the delayed modulation of Hebbian plasticity-accompanied with adaptation of firing thresholds-is a sufficient neuronal substrate to attain Bayes optimal inference and control. We corroborated this proposition using numerical analyses of maze tasks. This theory offers a universal characterisation of canonical neural networks in terms of Bayesian belief updating and provides insight into the neuronal mechanisms underlying planning and adaptive behavioural control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic of an external milieu and neural network, and the corresponding Bayesian formation.
a Interaction between the external milieu and autonomous system comprising a two-layer neural network. On receiving sensory inputs or observations o(t) that are generated from hidden states s(t), the network activity x(t)generates outputs y(t). The gradient descent on a neural network cost function L determines the dynamics of neural activity and plasticity. Thus, L is sufficient to characterise the neural network. The proposed theory affirms that the ensuing neural dynamics are self-organised to encode the posterior beliefs about hidden states and decisions. b Corresponding variational Bayesian formation. The interaction depicted in a is formulated in terms of a POMDP model, which is parameterised by A,B,Cθ and D,Eλ. Variational free energy minimisation allows an agent to self-organise to encode the hidden states of the external milieu—and to make decisions minimising future risk. Here, variational free energy F is sufficient to characterise the inferences and behaviours of the agent.
Fig. 2
Fig. 2. Factor graph depicting a fictive causality of factors that the generative model hypothesises.
The POMDP model is expressed as a Forney factor graph, based upon the formulation in ref. . The arrows from the present risk γt—sampled from Γt—to past decisions δτ optimise the policy in a post hoc manner, to minimise future risk. In reality, the current error γt is determined based on past decisions (top). In contrast, decision making to minimise the future risk implies a fictive causality from γt to δτ (bottom). Inference and learning correspond to the inversion of this generative model. Postdiction of past decisions is formulated as the learning of the policy mapping, conditioned by γt. Here, A, B and C indicate matrices of the conditional probability, and bold case variables are the corresponding posterior beliefs. Moreover, D* and E* indicate the true prior beliefs about hidden states and decisions, while D and E indicate the priors that the network operates under. When and only when D=D* and E=E*, inferences and behaviours are optimal for a given task or set of environmental contingencies, and are biased otherwise.
Fig. 3
Fig. 3. Mathematical equivalence between variational free energy and neural network cost functions, depicted by one-to-one correspondence of their components.
Top: variational free energy transformed from Eq. (5) using the Bayes theorem. Here, B=BTdiag[D]1 and C=CTdiag[E]1 indicate the inverse mappings, and D and E are the state and decision priors. Bottom: neural network cost function that is a counterpart to the aforementioned variational free energy. In this equation, W^l:=sig(Wl), K^l:=sig(Kl), and V^l:=sig(Vl) (for l=0,1) indicate the sigmoid functions of synaptic strengths. Moreover, ϕl and ψl are perturbation terms that characterise the bias in firing thresholds. Here, ϕl:=ϕl(Wl,Kl)=hllnW^l¯1lnK^l¯1 is a function of Wl and Kl, while ψl:=ψl(Vl)=mllnV^l¯1 is a function of Vl. When ω^i:=sig(ωi) is the sigmoid function of ωi, ωilnω^ilnω^i¯ holds for an arbitrary ωi. Using this relationship, Eq. (7) is transformed into the form presented at the bottom of this figure. This form of cost functions formally corresponds to variational free energy expressed on the top of this figure. Blue lines show one-to-one correspondence of their components.
Fig. 4
Fig. 4. Simulations of neural networks solving maze tasks.
a Neural network architecture. The agent receives the states (pathway or wall) of the neighbouring 11 × 11 cells as sensory inputs. A decision here represents a four-step sequence of actions (selected from up, down, left or right), resulting in 256 options in total. The panels on the right depict observations and posterior beliefs about hidden states and decisions. b General view of the maze. The maze comprises a discrete state space, wherein white and black cells indicate pathways and walls, respectively. A thick blue cell indicates the current position of the agent, while the thin blue line is its trajectory. Starting from the left, the agent needs to reach the right edge of the maze within T=2×104 time steps. c Trajectories of the agent’s x-axis position in sessions before (black, session 1) and after (blue, session 100) training. d Duration to reach the goal when the neural network operates under uniform decision priors Eright=Eleft=Eup=Edown=1/2560.0039 (where Eright indicates the prior probability to select a decision involving the rightward motion in the next step). Blue and red circles indicate succeeded and failed sessions, respectively. e Failure probability (left) and duration to reach the goal (right) when the neural network operates under three different prior conditions Eright=0.0023,0.0039,0.0055 (black, blue and cyan, respectively), where Eleft=0.0078Eright and Eup=Edown=0.0039 hold. The line indicates the average of ten successive sessions. Although the neural network with Eright=0.0055 exhibits better performance in the early stage, it turns out to overestimate a preference of the rightward motion in later stages, even when it approaches the wall. e was obtained with 20 distinct, randomly generated mazes. Shaded areas indicate the standard error. Refer to Methods section ‘Simulations’ for further details.
Fig. 5
Fig. 5. Estimation of implicit priors enables the prediction of subsequent learning.
a Estimation of implicit prior Eright—encoded by threshold factor ψ—under three different prior conditions (black, blue and cyan; c.f., Fig. 4). Here, ψ was estimated through Bayesian inference based on sequences of neural activity, obtained with ten distinct mazes. Then, Eright was computed by lnE1=ψ1 for each of 64 elements. The other 192 elements of E1 (i.e. Eleft,Eup,Edown) were also estimated. The sum of all the elements of E1 was normalised to 1. b Prediction of the learning process within previously unexperienced, randomly generated mazes. Using the estimated E, we reconstructed the computational architecture (i.e. neural network) of the agent. Then, we simulated the adaptation process of the agent’s behaviour using the reconstructed neural network and computed the trajectory of the probability of failure to reach the goal within T=2×104 time steps. The resulting learning trajectories (solid lines) predict the learning trajectories of the original agent (dashed lines) under three different prior conditions, in the absence of observed neural responses and behaviours. Lines and shaded areas indicate the mean and standard error, respectively. Inset panels depict comparisons between the failure probability of the original and reconstructed agent after learning (average over session 51–100), within ten previously unexperienced mazes. Refer to Methods section ‘Data analysis’ for further details.

References

    1. Linsker R. Self-organization in a perceptual network. Computer. 1988;21:105–117.
    1. Dayan P, Hinton GE, Neal RM, Zemel RS. The Helmholtz machine. Neural Comput. 1995;7:889–904. - PubMed
    1. Sutton, R. S. & Barto, A. G. Reinforcement Learning (MIT Press, 1998).
    1. Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).
    1. Friston KJ, Kilner J, Harrison L. A free energy principle for the brain. J. Physiol. Paris. 2006;100:70–87. - PubMed

Publication types