Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making

Giuseppe Cuccu¹, Julian Togelius², Philippe Cudré-Mauroux¹

Affiliations

¹ eXascale Infolab, Department of Computer Science, University of Fribourg, Fribourg, Switzerland.
² Game Innovation Lab, Tandon School of Engineering, New York University, New York, NY USA.

PMID: 34720684
PMCID: PMC8550197
DOI: 10.1007/s10458-021-09497-8

Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making

Giuseppe Cuccu et al. Auton Agent Multi Agent Syst. 2021.

. 2021;35(2):17.

doi: 10.1007/s10458-021-09497-8. Epub 2021 Apr 19.

Authors

Giuseppe Cuccu¹, Julian Togelius², Philippe Cudré-Mauroux¹

Affiliations

¹ eXascale Infolab, Department of Computer Science, University of Fribourg, Fribourg, Switzerland.
² Game Innovation Lab, Tandon School of Engineering, New York University, New York, NY USA.

PMID: 34720684
PMCID: PMC8550197
DOI: 10.1007/s10458-021-09497-8

Abstract

We propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game's controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.

Keywords: Evolutionary algorithms; Game playing; Learning agent capabilities; Neuroevolution.

PubMed Disclaimer

Figures

**Fig. 1**
System diagram. At each generation the optimizer (1) generates sets of weights (2) for the neural network controller (3). Each network is evaluated episodically against the environment (4). At each step the environment sends an observation (5) to an external compressor (6), which produces a compact encoding (7). The network uses that encoding as input. Independently, the compressor selects observations (8) for its training set (9). At the end of the episode, the environment returns the fitness (cumulative reward; 10) to the optimizer for training (neuroevolution; 11). Compressor training (12) takes place in between generations

**Fig. 2**
Trained centroids. Samples of centroids trained with IDVQ during runs on different games. Notice how the first centroid typically captures the initial state of the game, often identifiable as the background. By design, the following centroids then represent sprites that have changed w.r.t. that first image, thus identifying active elements of the game, such as avatars, enemy, and interactive props. Colors are inverted for printing convenience

See this image and copyright information in PMC

References

1. Alvernaz, S., & Togelius, J. (2017). Autoencoder-augmented neuroevolution for visual doom playing. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on, IEEE, pp 1–8.
1. Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, D., & Blundell, C. (2020). Agent57: Outperforming the Atari human benchmark. arXiv preprint arXiv:200313350.
1. Bellemare MG, Naddaf Y, Veness J, Bowling M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research. 2013;47:253–279. doi: 10.1613/jair.3912. - DOI
1. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv:1606.01540.
1. Chrabaszcz, P., Loshchilov, I., & Hutter, F. (2018). Back to basics: Benchmarking canonical evolution strategies for playing atari. arXiv preprint arXiv:180208842.

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making

Affiliations

Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials