Action-driven contrastive representation for reinforcement learning
- PMID: 35303031
- PMCID: PMC8932622
- DOI: 10.1371/journal.pone.0265456
Action-driven contrastive representation for reinforcement learning
Abstract
In reinforcement learning, reward-driven feature learning directly from high-dimensional images faces two challenges: sample-efficiency for solving control tasks and generalization to unseen observations. In prior works, these issues have been addressed through learning representation from pixel inputs. However, their representation faced the limitations of being vulnerable to the high diversity inherent in environments or not taking the characteristics for solving control tasks. To attenuate these phenomena, we propose the novel contrastive representation method, Action-Driven Auxiliary Task (ADAT), which forces a representation to concentrate on essential features for deciding actions and ignore control-irrelevant details. In the augmented state-action dictionary of ADAT, the agent learns representation to maximize agreement between observations sharing the same actions. The proposed method significantly outperforms model-free and model-based algorithms in the Atari and OpenAI ProcGen, widely used benchmarks for sample-efficiency and generalization.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures






References
-
- Such F, Madhavan V, Liu R, Wang R, Castro P, Li Y, et al. An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents. In: International Joint Conference on Artificial Intelligence; 2019. pp. 3260–3267.
-
- Anand A, Racah E, Ozair S, Bengio Y, Côté MA, Hjelm RD. Unsupervised state representation learning in atari. In: Advances in neural information processing systems; 2019. pp. 8769–8782.
-
- Yarats D, Zhang A, Kostrikov I, Amos B, Pineau J, Fergus R. Improving sample efficiency in model-free reinforcement learning from images. arXiv preprint arXiv:191001741. 2019;.
-
- Gregor K, Besse F. Temporal Difference Variational Auto-Encoder. ArXiv. 2019;abs/1806.03107.
-
- Higgins I, Pal A, Rusu AA, Matthey L, Burgess CP, Pritzel A, et al. Darla: Improving zero-shot transfer in reinforcement learning. arXiv preprint arXiv:170708475. 2017;.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources