Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 11:19:1547264.
doi: 10.3389/fnins.2025.1547264. eCollection 2025.

Foveal vision reduces neural resources in agent-based game learning

Affiliations

Foveal vision reduces neural resources in agent-based game learning

Runping Chen et al. Front Neurosci. .

Abstract

Efficient processing of information is crucial for the optimization of neural resources in both biological and artificial visual systems. In this paper, we study the efficiency that may be obtained via the use of a fovea. Using biologically-motivated agents, we study visual information processing, learning, and decision making in a controlled artificial environment, namely the Atari Pong video game. We compare the resources necessary to play Pong between agents with and without a fovea. Our study shows that a fovea can significantly reduce the neural resources, in the form of number of neurons, number of synapses, and number of computations, while at the same time maintaining performance at playing Pong. To our knowledge, this is the first study in which an agent must simultaneously optimize its visual system, along with its decision making and action generation capabilities. That is, the visual system is integral to a complete agent.

Keywords: multi-resolution sensory integration; neural resources; neuromorphic computing; reinforcement learning; visual neuroscience.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Schematic diagram of the single- and multi-resolution Deep-Q Network (DQN) agents that learn to play Pong. Open AI Gym Pong video frames were preprocessed into a peripheral, low-resolution but full frame and a fovea-like, high-resolution but zoomed frame (active areas are denoted in black). These frames from up to four time steps (labeled by “t, t − 1, ...") were sparse coded by a Locally Competitive Algorithm (LCA). The DQN agent derived actions from these outputs. The single-resolution agent (A) exclusively received visual input from the periphery, then determined the paddle action, which was sent back to the OpenAI Gym agent (black “Action” arrow). The multi-resolution agent (B) received input from the periphery and fovea and returned the paddle action to OpenAI Gym and additionally updated the foveal movement parameters, used internally in the agent for the next time step (black “Action” and red “Move” arrow, respectively).
Figure 2
Figure 2
Detailed structure of single- and multi-resolution agents. Data flow of multi-resolution agent with 16 × 16 periphery, 20 × 20 fovea and n-frame input is shown as an example. The preprocessed frame sequence was input into corresponding foveal and peripheral LCA and CNN networks to extract spatiotemporal features. The single-resolution agent (black arrows) used only peripheral features (i.e. full frame at fixed resolution) to select paddle action. The multi-resolution agent (black and red arrows) combined peripheral and foveal features, fovea movement and position to select a paddle action and fovea movement in the next time step. *: The single-resolution agent had two fully connected layers to select paddle action while multi-resolution agent had three.
Figure 3
Figure 3
Comparing the LCA reconstruction loss at different times. (A) LCA reconstruction loss vs. time. Periphery = 40 × 40 for all reconstructions in this figure. (B) and (C) Training curves of agents trained at time step 10 and time step 25. (B) Single-resolution agent {40, 2}. (C) Multi-resolution agent {16, 12, 2}. Note that the agents train well at either time step 10 or 25.
Figure 4
Figure 4
Win score heatmap of single- and multi-resolution agents showing differences in input quality. The best agent win scores are plotted and bicubic interpolation is used to fill other areas. The color bar denotes win score values. Markers (up triangle, square, plus, down triangle, diamond and x) in the heatmap correspond to agents with different input qualities and hyperparameters. Their training curves, test results and win score distributions are shown in Figures 5, 6. (A) Heatmap of single-resolution agent, exhibiting low win score with periphery ≤16 × 16 and 1-frame input, but higher win scores at high resolution and long history length. (B) Heatmap of multi-resolution model with 1 frame (bottom) and 2 frames (top) as input, exhibiting high win score even with periphery = 8 × 8 and 1-frame input.
Figure 5
Figure 5
Training curves and test results of single- and multi-resolution agents. Model weights are saved every 40 episodes during training and evaluated for 20 games. Evaluation results are plotted with shading indicating standard deviation. Weights with the best evaluation result are tested for another 1,000 games and test results are plotted with error bar indicating standard deviation. Single-resolution agent {16, 2} (A), {40, 2} (B), {40, 4} (C). Multi-resolution agent {16, 20, 1} (D), {16, 12, 2} (E), {16, 20, 2} (F).
Figure 6
Figure 6
The distribution of single- and multi-resolution agents' test results. (A) Single-resolution agent. (B) Multi-resolution agent.
Figure 7
Figure 7
Multi-resolution agents using fewer resources than single-resolution agent to achieve the same win score. Circle: single-resolution agent. Diamond: multi-resolution agent. Function y=a-bxc+d is used to fit win scores of single-resolution agent (dash line) and multi-resolution agent (solid line). (A) Neurons. (B) Synapses. (C) FLOPs.

Similar articles

References

    1. Adhikari A., Ren Y. (2021). RL-PONG: Playing Pong From Pixels. Project Proposal, CSCE 790-001: Deep Reinforcement Learning and Search.
    1. Baker B., Akkaya I., Zhokov P., Huizinga J., Tang J., Ecoffet A., et al. . (2022). Video pretraining (VPT): learning to act by watching unlabeled online videos. Adv. Neural Inf. Process. Syst. 35, 24639–24654. 10.48550/arXiv.2206.11795 - DOI
    1. Bringmann A. (2019). Structure and function of the bird fovea. Anat. Histol. Embryol. 48, 177–200. 10.1111/ahe.12432 - DOI - PubMed
    1. Bringmann A., Syrbe S., Görner K., Kacza J., Francke M., Wiedemann P., Reichenbach A. (2018). The primate fovea: structure, function and development. Prog. Retin. Eye Res. 66, 49–84. 10.1016/j.preteyeres.2018.03.006 - DOI - PubMed
    1. Chavez Arana D., Renner A., Sornborger A. (2023). “Spiking LCA in a neural circuit with dictionary learning and synaptic normalization,” in Proceedings of the 2023 Annual Neuro-Inspired Computational Elements Conference (New York, NY: ACM; ), 47–51. 10.1145/3584954.3584968 - DOI

LinkOut - more resources