Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 20:arXiv:2508.15013v1.

Goals and the Structure of Experience

Affiliations

Goals and the Structure of Experience

Nadav Amir et al. ArXiv. .

Abstract

Purposeful behavior is a hallmark of natural and artificial intelligence. Its acquisition is often believed to rely on world models, comprising both descriptive (what is) and prescriptive (what is desirable) aspects that identify and evaluate state of affairs in the world, respectively. Canonical computational accounts of purposeful behavior, such as reinforcement learning, posit distinct components of a world model comprising a state representation (descriptive aspect) and a reward function (prescriptive aspect). However, an alternative possibility, which has not yet been computationally formulated, is that these two aspects instead co-emerge interdependently from an agent's goal. Here, we describe a computational framework of goal-directed state representation in cognitive agents, in which the descriptive and prescriptive aspects of a world model co-emerge from agent-environment interaction sequences, or experiences. Drawing on Buddhist epistemology, we introduce a construct of goal-directed, or telic, states, defined as classes of goal-equivalent experience distributions. Telic states provide a parsimonious account of goal-directed learning in terms of the statistical divergence between behavioral policies and desirable experience features. We review empirical and theoretical literature supporting this novel perspective and discuss its potential to provide a unified account of behavioral, phenomenological and neural dimensions of purposeful behaviors across diverse substrates.

Keywords: goals; state representation; world models.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Telic states are sensitive to the goal
(Left) all six possible routes, or experiences, are equally desirable with respect to the goal of reaching the park and can therefore be clustered into a single telic state S0. (Right) for the goal of getting a cup of coffee on the way to the park, these experiences are split into two telic states: S1, containing the routes that do not pass by the local cafe, and S2 containing those that do.
Figure 2:
Figure 2:. Agentic vs. telic models of learning:
(Left) traditional models view agents as the fundamental learning units, with dedicated neural modules for state and value estimation. (Right) states are typically assumed to be estimated on the basis of the previous state and the current observation while evaluation relies on stored values for different states which are updated based on a privileged set of observations that are considered “rewarding”. In the telic states framework, purposeful behavior stems from goals, which induce telic states, encapsulating descriptive and evaluative aspects of experience. Goals are represented across multiple brain regions and may even extend beyond the brain, e.g., in cases of multiple agents pursuing a joint goal
Figure 3:
Figure 3:. Dual goal navigation task:
each tile shows 500 one-dimensional random walk trajectories of length T=30, generated by a Gaussian policy parameterized by the mean (μ) and standard deviation (σ) of position update step (x-axis) across time (y-axis). Regions of interest R and L consist of line segments centered around xR=2 and xL=2, shown as green and red lines respectively at T=30. Trajectories reaching one of the goals are plotted in the corresponding color, illustrating the relationship between policy parameters and goal reaching likelihoods. The default policy, μ0,σ0=(0,1), shown in the center gray tile, is equally likely to reach R and L.
Figure 4:
Figure 4:. Telic state representation learning for navigation task with shifting goals:
points in (μ,σ) policy space colored by the difference between their probability of reaching unit length regions, R and L, centered around 2 and −2 respectively, at time T=30. Top left: telic states SL and SR (outlined by red and green dashed lines, respectively) consist of policies that are more likely to reach the corresponding region by a threshold of ϵ=0.1 or more. Contour lines indicate isometric policy complexity levels, relative to the default policy π0:μ0=0,σ0=1 (black dot), for a capacity bound of δ=1 bit. Green and red dots show the information projection of π0 on SR and SL respectively, i.e., the policies each telic state closest to π0 in KL-divergence Top right: shifting the center of R to 2.5, renders SR unreachable from π0 with δ bounded policy complexity. The policy πM:μM,σM (yellow dot) is the one closest to SR while still within the complexity capacity of the agent. Bottom left: splitting SR by inserting an intermediate telic-state, SM, centered around μM. By construction, the nearest distribution to π0 in SM, in the KL sense (orange dot), is within the agent’s complexity capacity. Bottom right: both SM and SR are reachable with respect to the agent’s new default policy, πMμM=1.37,σM=1.15 (see algorithm 1 for details); the new telic state representation S0,SL,SM,SR is telic controllable with respect to π0(0,1), δ=1, and N=1.
Figure 5:
Figure 5:. Complexity-granularity curves:
Each line shows the policy complexity capacity, relative to the default policy π0(0,1) (ordinate) required to reach the corresponding telic state at a given representational granularity level, quantified by the negative log of the sensitivity parameter ϵ (abscissa). Dashed gray lines show the values used in the dual-goal navigation example: δ=1 (horizontal) and ϵ=0.1 (vertical)
Figure 6:
Figure 6:. Goal-complexity tradeoff curves:
the probability of reaching each telic state as a function of policy complexity. Left: an agent with a default policy π0:μ0,σ0=(0,1) is unable to reach telic state SR with a complexity capacity limit of δ=1 (gray vertical line). Right: with π1:μ1,σ1=(1.09,1.24) as its default policy, the agent can reach both SM and SR with the same policy complexity capacity.

References

    1. Summerfield C. 2022. Natural General Intelligence: How understanding the brain can help us build AI. Oxford university press.
    1. Rosenblueth A, Wiener N, Bigelow J. 1943. Behavior, purpose and teleology. Philosophy of science 10, 18–24.
    1. Molinaro G, Collins AGE. 2023. A goal-centric outlook on learning. Trends in Cognitive Sciences. - PubMed
    1. Chu J, Tenenbaum JB, Schulz LE. 2023. In praise of folly: flexible goals and human cognition. Trends in Cognitive Sciences. - PubMed
    1. Davidson G, Todd G, Togelius J, Gureckis TM, Lake BM. 2024. Goals as Reward-Producing Programs. arXiv preprint arXiv:2405.13242.

Publication types

LinkOut - more resources