. 2022 Nov 14;17(11):e0277199.

doi: 10.1371/journal.pone.0277199. eCollection 2022.

Structure learning enhances concept formation in synthetic Active Inference agents

Victorita Neacsu¹, M Berk Mirza^{2

3}, Rick A Adams^{4

5}, Karl J Friston¹

Affiliations

¹ Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, United Kingdom.
² Department of Psychology, University of Cambridge, Cambridge, United Kingdom.
³ Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
⁴ Department of Computer Science, Centre for Medical Image Computing, University College London, London, United Kingdom.
⁵ Max Planck Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.

PMID: 36374909
PMCID: PMC9662737
DOI: 10.1371/journal.pone.0277199

Structure learning enhances concept formation in synthetic Active Inference agents

Victorita Neacsu et al. PLoS One. 2022.

. 2022 Nov 14;17(11):e0277199.

doi: 10.1371/journal.pone.0277199. eCollection 2022.

Authors

Victorita Neacsu¹, M Berk Mirza^{2

3}, Rick A Adams^{4

5}, Karl J Friston¹

Affiliations

¹ Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, United Kingdom.
² Department of Psychology, University of Cambridge, Cambridge, United Kingdom.
³ Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
⁴ Department of Computer Science, Centre for Medical Image Computing, University College London, London, United Kingdom.
⁵ Max Planck Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.

PMID: 36374909
PMCID: PMC9662737
DOI: 10.1371/journal.pone.0277199

Abstract

Humans display astonishing skill in learning about the environment in which they operate. They assimilate a rich set of affordances and interrelations among different elements in particular contexts, and form flexible abstractions (i.e., concepts) that can be generalised and leveraged with ease. To capture these abilities, we present a deep hierarchical Active Inference model of goal-directed behaviour, and the accompanying belief update schemes implied by maximising model evidence. Using simulations, we elucidate the potential mechanisms that underlie and influence concept learning in a spatial foraging task. We show that the representations formed-as a result of foraging-reflect environmental structure in a way that is enhanced and nuanced by Bayesian model reduction, a special case of structure learning that typifies learning in the absence of new evidence. Synthetic agents learn associations and form concepts about environmental context and configuration as a result of inferential, parametric learning, and structure learning processes-three processes that can produce a diversity of beliefs and belief structures. Furthermore, the ensuing representations reflect symmetries for environments with identical configurations.

Copyright: © 2022 Neacsu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Fig 1. Graphical depiction of the generative model.**
This deep (temporal) generative model has two hierarchical levels. At the lower level there are two hidden state factors: *Location* and *context*. These generate outcomes in three outcome modalities: *Location*, *reward*, and *context* (i.e., room cue). At the higher level, there is one hidden state factor and outcome modality: *Context* (room identity); the link between the higher and lower level is via the *context* factor. Latent states at the higher level generate initial states for the lower level, which themselves unfold to generate a sequence of outcomes. Lower levels cycle for a sequence of 5 time-steps for each transition of the higher level, and there are 5 epochs in the higher level for every iteration. This scheduling endows the generative model with a deep temporal structure. The likelihood A is a matrix whose elements are the probability of an outcome under every combination of hidden states. B represents probabilistic transitions between hidden states, which depend on actions determined by policies π. C specifies prior preferences and D specifies priors over initial states. *Cat* denotes a categorical probability distribution. *Dir* denotes a Dirichlet distribution (the conjugate prior of the *Cat* distribution). Please see Table 1 for a glossary of terms.

**Fig 2. Example paths and room types.**
a) Examples of simulated paths or policies that agents could choose in one of the 16 possible rooms. The agents are allowed to make 4 moves, regardless of whether they find the reward or not. b) Sample rooms (out of 16 possible rooms). Rooms 2 and 15, as well as rooms 3 and 16 share the same contextual cue (colour) and reward location (locations 1 and 9 respectively). Every higher-level block involves foraging five rooms (whose identity is unknown) and exploring each of them in order for four time-steps in the lower level.

**Fig 3. Schematic overview of belief updating.**
Left panel: Belief updates defining Active Inference: State-estimation, policy evaluation and action selection. These belief updates are expressed in terms of expectations, which play the role of sufficient statistics for these categorical variables. Right panel: Here, the expectations that are updated are assigned to various brain areas. This depiction is purely schematic, and its purpose is to illustrate a rudimentary functional anatomy implied by the functional form of the belief updating. Here, we have assigned observed outcomes to the occipital cortex, given its involvement in visual processing of spatial location [72,73], whereas *reward* outcomes are assigned to the inferotemporal cortex given its contributions to forming stimulus-reward associations [74]. Hidden states encoding the context have been associated with the hippocampal formation [75,76], and the remaining states encoding sampling location have been assigned to the parietal cortex, given its role in the encoding of multiple action-based spatial representations [–79]. The evaluation of policies, in terms of their expected free energy, has been placed in the ventral prefrontal cortex. Expectations about policies *per se* and the precision of these beliefs have been associated with striatal and ventral tegmental areas, respectively, to indicate a putative role for dopamine in encoding precision [4]. The arrows denote message passing among the sufficient statistics of each factor or marginal. First and second digits in the superscript (e.g., o^(1),1) indicate the hierarchical level and modality, respectively. Please see glossary in Table 1 and [4] for a detailed explanation of the equations and notation.

**Fig 4. Example alternative models and flow chart depicting simulations.**
a)-d) Example alternative models (i.e., hypotheses). The generative process (also Alternative model 1) and alternative hypotheses were subject to Bayesian Model Reduction, focusing on the likelihood mappings encoding the *context* modality. Matrices represent a mapping from *context* states (columns) to the *context* outcomes (rows)–this can be thought of as *room identity* (s) to *room colour* (o). a) Note the identity matrix defined in the generative process is also used as an alternative hypothesis for model comparison. b) The second hypothesis depicts the identical pairs of rooms as having a 50% probability. c) The third hypothesis represents rooms 2 & 15 as being Room 15 and rooms 3 & 16 as Room 16. d) In the fourth hypothesis, rooms 15 and 16 do not exist, having a uniform distribution over all the other potential rooms–that is, rooms 15 and 16 are equally likely to have any other possible identities. e) Flow chart depicting the core simulations for all 120 agents– 60 undergoing the ‘BMR’ condition, and 60 in the ‘No BMR’ condition.

**Fig 5. Average performance with learning.**
a) Progressive increase in performance scored by the amount of reward gained per block. For each higher-level block, five rooms at the lower level are explored. The performance is averaged over 20 simulated agents for each of the training settings: 10, 20, 30, 40, and 50 blocks (i.e., 50, 100, 150, 200, and 250 rooms respectively). Please note that the dashed blue line illustrates a cap in performance, represented by total reward gathered per block, averaged over 20 fully knowledgeable agents foraging for N = 50 blocks (i.e., agents that start with fully precise likelihood matrices). As agents progress through the simulations, they accumulate more reward per trial. The concavity at the beginning of training reflects exploratory behaviour; i.e., intrinsic value predominated over the extrinsic value of rewards. b) Learning: The progressive updates to the concentration parameters over state-outcome associations from a uniform distribution to a more precise one, representing concept formation. The agent forages for N = 50 blocks at the higher level (i.e., 250 lower-level trials). Middle trial represents the end of block number 25 (at the higher level).

**Fig 6. Likelihood mappings from hidden states to *context* outcomes, before and after BMR, and how these learned mappings affect concept formation for three different agents.**
Matrices represent the likelihood mapping from *context* states (i.e., columns) to *context* outcomes (i.e., rows). a) The process generating the actual state-outcome mappings (left) and the uniform concentration parameters that agents start with (right). b) Likelihood matrices for the three agents (averaged over all locations) at the end of 2, 20, and 50 training blocks at the higher level of foraging (from left to right). c) Likelihood matrices after BMR, showing the reduced set of state-outcome associations (i.e., likelihood) for the *context* factor. d) Information gain for the *context* modality before and after BMR for each of the three agents. e) Comparison of information gain before and after BMR, averaged over agents for each condition; light blue bars denote information gain after BMR whereas light green bars denote information gain before BMR (i.e., after N training blocks).

**Fig 7. Performance comparison between agents undergoing BMR versus continuing with the posteriors accumulated after a specified number of training trials (at the lower level).**
Each asterisk represents an agent; circles represent performance averaged over agents at a specified number of training trials and their respective condition (BMR vs No BMR). Agents with n training trials are assigned to one of the two conditions, and then continue to forage for another 100 (lower level) trials. a) Total reward gained–agents undergoing BMR perform almost at peak, even before foraging through all of the 16 rooms. b) The number of times reward was (not) found–agents in the ‘no BMR’ condition spend more time foraging without finding the reward. Performance improves for agents in both conditions as they undergo more training trials.

**Fig 8. Possible representations of *similarity* between the rooms for three different agents after BMR and the most frequently chosen hypothesis during BMR.**
a) Likelihood matrices representing the reduced posterior concentration parameters. The matrices represent the *context* state-outcome mappings with rows representing the context state, and the columns representing the context outcome. The likelihood mapping for the first agent shows that rooms 2&15 as having the identity of context 15, and rooms 3&16 as being context 16. The second agent’s beliefs show that rooms 3&16 have the identity of context (room) 16, and rooms 2&15 as having the identity of context 2. The third agent believes that there is an equal probability for the rooms that are identical in terms of their configuration: 2&15 can be either context 2 or 15 and rooms 3&16 are equally likely to be either context (room) 3 or 16. b) The percentage of time the hypothesis with an equal (‘50–50’) probability for the rooms with identical configurations was chosen by the agents, for different numbers of training blocks. At N = 50 (i.e., after 50 higher level training blocks) this hypothesis is chosen 100% of time–that is, all 20 agents training for 50 higher level blocks, selected this hypothesis as being the most parsimonious, explaining the observations with the least model complexity. c) The negative log evidence for the twelve alternative hypotheses/models (x axis) for the entire set of agents (y axis, 120 agents). Model 7 appears to consistently have the greatest evidence (i.e., least free energy).

**Fig 9. Neural activity for a synthetic agent in two rooms with identical configurations.**
In these epochs, the agent forages the two rooms in the same manner–that is, it follows the same trajectory of locations. During the last two steps, the agent encounters the reward and stays with the reward for one more step. Please see main text above for more details. a) Room 2 b) Room 15.

**Fig 10. Performance comparison between an agent with a strong preference for reward (C = 3) versus an agent with a weaker preference (C = 0.5).**
a) Likelihood mappings after 20 training blocks, including a fully knowledgeable agent (right). b) Total reward accumulated over 20 (higher level) blocks. c) Comparison between the two agents, in terms of the information gain associated with the *context* modality.

See this image and copyright information in PMC

References

1. Da Costa L, Parr T, Sajid N, Veselic S, Neacsu V, Friston K. Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology. 2020;99:102447. doi: 10.1016/j.jmp.2020.102447 - DOI - PMC - PubMed
1. Friston K, Buzsáki G. The Functional Anatomy of Time: What and When in the Brain. Trends Cogn Sci. 2016;20(7):500–11. doi: 10.1016/j.tics.2016.05.001 - DOI - PubMed
1. Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, O’Doherty J, Pezzulo G. Active inference and learning. Neuroscience & Biobehavioral Reviews. 2016;68:862–79. - PMC - PubMed
1. Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, Pezzulo G. Active Inference: A Process Theory. Neural Computation. 2017;29(1):1–49. doi: 10.1162/NECO_a_00912 - DOI - PubMed
1. Friston K, Schwartenbeck P, Fitzgerald T, Moutoussis M, Behrens T, Dolan R. The anatomy of choice: active inference and agency. Front Hum Neurosci. 2013;7(598). doi: 10.3389/fnhum.2013.00598 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Structure learning enhances concept formation in synthetic Active Inference agents

Affiliations

Structure learning enhances concept formation in synthetic Active Inference agents

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources