Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun;31(3):197-212.
doi: 10.1177/10597123221095880. Epub 2022 Jun 9.

What's a good prediction? Challenges in evaluating an agent's knowledge

Affiliations

What's a good prediction? Challenges in evaluating an agent's knowledge

Alex Kearney et al. Adapt Behav. 2023 Jun.

Abstract

Constructing general knowledge by learning task-independent models of the world can help agents solve challenging problems. However, both constructing and evaluating such models remain an open challenge. The most common approaches to evaluating models is to assess their accuracy with respect to observable values. However, the prevailing reliance on estimator accuracy as a proxy for the usefulness of the knowledge has the potential to lead us astray. We demonstrate the conflict between accuracy and usefulness through a series of illustrative examples including both a thought experiment and an empirical example in Minecraft, using the General Value Function framework (GVF). Having identified challenges in assessing an agent's knowledge, we propose an alternate evaluation approach that arises naturally in the online continual learning setting: we recommend evaluation by examining internal learning processes, specifically the relevance of a GVF's features to the prediction task at hand. This paper contributes a first look into evaluation of predictions through their use, an integral component of predictive knowledge which is as of yet unexplored.

Keywords: Reinforcement learning; agent knowledge; general value functions.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Using the limited senses available to the agent, it must construct an abstraction such that it can understand a world it can never completely see. One way of constructing an agent’s knowledge of the world is by predicting what would happen if the agent behaved a certain way. (a) Often an agent cannot observe the true state of the environment; e.g., an agent in a room may only observe what it can see in front of itself and whether the agent bumped into something. (b) Using limited sight and touch sensation, we can phrase basic spatial awareness as making predictions about moving around the room: e.g., “can I touch something in front of me?”, or “how far is the nearest wall to my left”? (c) A prediction about bumping is used toconstruct a touch prediction, the output of which is used as the target for the touch-left and touch-right predictions. Adapted from Ring (2021).
Figure 2.
Figure 2.
Two estimates of the same signal: one in green and one in orange. The cumulant c is indicated by the grey square pulse. The return of G t of the cumulant is presented as a dotted line. Two hypothetical estimates of the return are presented in green and orange.
Figure 3.
Figure 3.
A visual representation of our agent approximating the visual input by sub-sampling 100 random pixels. (a) The visual input the agent totaling 320×480 pixels (b) Visualization of the image subsampled to 100 random pixels.
Figure 4.
Figure 4.
Cumulative Recent Unsigned Projected Error Estimate (RUPEE over 250,000 time-steps for the ‘touch-left’ and ‘touch-right’ predictions averaged over 30 independent trials. (a) Cumulative RUPEE for tile-coded touch estimate (green) and bias-bit touch estimate (orange).The tracking estimate accumulates error at a slower rate than the anticipatory prediction. Evaluating based on RUPEE alone, we would be led to believe that the tracking model is best, despite leading to catastrophic prediction error when used to inform touch-left and touch-right (c.f. Figure 5). The anticipatory touch estimate has a greater accumulation of error throughout the experiment despite being a better estimator for informing touch-left and touch-right predictions (b) Cumulative RUPEE for touch-left and touch-right estimates which use as a cumulant the tile-coded (green) and bias bit (orange) touch estimate. Estimates dependent on the tracking GVF for learning have a greater cumulative error than the GVFs dependent on the Tile Coder GVF. Error as accumulated at roughly the same rate as the anticipatory GVFs, making it challenging to distinguish which of the prediction is better, despite wildly different outcomes when comparing prediction to ground-truth (c.f. Figure 5). The error of the lower-order models does not always determine their effectiveness in informing further learning.
Figure 5.
Figure 5.
Each sub-figure depicts estimates of each of the GVFs in our networks for 150 examples of the agent approaching a wall and then turning left. Five examples of the trajectory are drawn from 30 independent trials: results presented are averaged over 150 examples of the same trajectory. (a) Tile-coded touch estimate (green) and bias-bit touch estimate (orange) (b) touch-right estimates which use as a cumulant the tile-coded (green) and bias bit (orange) touch estimate (c) touch-left estimates which use as a cumulant the tile-coded (green) and bias bit (orange) touch estimate.
Figure 6.
Figure 6.
The average active step-sizes for each layer of both the prediction and tracking networks averaged over 30 independent trials. Error bars are standard error of the mean. (a) The average active step-size for both touch predictions. Anticipatory prediction in green; tracking based prediction in orange. (b) Average active step-size for the touch-left and touch-right predictions. Anticipatory predictions in green; tracking-based predictions in orange.
Figure 7.
Figure 7.
The average weighted feature relevance 1α|w|¯ for each layer of both the prediction and tracking networks. Each is run over 30 independent trials. Error bars are standard error of the mean. (a) Average weighted feature relevance 1α|w|¯ for touch predictions. (b) Average weighted feature relevance 1α|w|¯ for the touch-left and touch-right predictions. Anticipatory predictions in green; tracking-based predictions in orange.

References

    1. Barreto A., Dabney W., Munos R., Hunt J. J., Schaul T., van Hasselt H. P., Silver D. (2017). Successor features for transfer in reinforcement learning. Advances in Neural Information Processing Systems, 30, 4055–4065.
    1. Bengio Y., Courville A., Vincent P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. 10.1109/TPAMI.2013.5010.1109/TPAMI.2013.50 - DOI - DOI - PubMed
    1. Clark A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and brain sciences, 36(3), 181–204. 10.1017/S0140525X1200047710.1017/s0140525x12000477 - DOI - DOI - PubMed
    1. Comanici G., Precup D., Barreto A., Toyama D. K., Aygün E., Hamel P., Vezhnevets S., Hou S., Mourad S. (2018) Knowledge representation for reinforcement learning using general value functions. Technical report.
    1. Edwards A. L., Hebert J. S., Pilarski P. M. (2016). Machine learning and unlearning to autonomously switch between the functions of a myoelectric arm. In 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), Singapore, 26–29 June 2016 (pp. 514–521). Piscataway, NJ: IEEE. 10.1109/biorob.2016.7523678 - DOI

LinkOut - more resources