Dissociating model architectures from inference computations

Noor Sajid^{1

2}, Johan Medrano^{3

4

5}

Affiliations

¹ Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, USA.
² Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
³ K. Lisa Yang Integrative Computational Neuroscience Center, Massachusetts Institute of Technology, Cambridge, MA, USA.
⁴ Department of Biological Engineering, MIT, Cambridge, MA, USA.
⁵ Functional Imaging Laboratory, Queen Square Institute of Neurology, UCL, London, UK.

PMID: 40673431
DOI: 10.1080/17588928.2025.2532604

Dissociating model architectures from inference computations

Noor Sajid et al. Cogn Neurosci. 2025 Jan-Oct.

. 2025 Jan-Oct;16(1-4):26-28.

doi: 10.1080/17588928.2025.2532604. Epub 2025 Jul 17.

Authors

Noor Sajid^{1

2}, Johan Medrano^{3

4

5}

Affiliations

¹ Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, USA.
² Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
³ K. Lisa Yang Integrative Computational Neuroscience Center, Massachusetts Institute of Technology, Cambridge, MA, USA.
⁴ Department of Biological Engineering, MIT, Cambridge, MA, USA.
⁵ Functional Imaging Laboratory, Queen Square Institute of Neurology, UCL, London, UK.

PMID: 40673431
DOI: 10.1080/17588928.2025.2532604

Abstract

Parr et al., 2025 examines how auto-regressive and deep temporal models differ in their treatment of non-Markovian sequence modelling. Building on this, we highlight the need for dissociating model architectures-i.e., how the predictive distribution factorises-from the computations invoked at inference. We demonstrate that deep temporal computations are mimicked by autoregressive models by structuring context access during iterative inference. Using a transformer trained on next-token prediction, we show that inducing hierarchical temporal factorisation during iterative inference maintains predictive capacity while instantiating fewer computations. This emphasises that processes for constructing and refining predictions are not necessarily bound to their underlying model architectures.

Keywords: Deep temporal structures; language models; structured context access; transformers.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dissociating model architectures from inference computations

Affiliations

Dissociating model architectures from inference computations

Authors

Affiliations

Abstract

MeSH terms

LinkOut - more resources

Full Text Sources