Discovering state-of-the-art reinforcement learning algorithms
- PMID: 41125136
- PMCID: PMC12695655
- DOI: 10.1038/s41586-025-09761-x
Discovering state-of-the-art reinforcement learning algorithms
Abstract
Humans and other animals use powerful reinforcement learning (RL) mechanisms that have been discovered by evolution over many generations of trial and error. By contrast, artificial agents typically learn using handcrafted learning rules. Despite decades of interest, the goal of autonomously discovering powerful RL algorithms has proven to be elusive1-6. Here we show that it is possible for machines to discover a state-of-the-art RL rule that outperforms manually designed rules. This was achieved by meta-learning from the cumulative experiences of a population of agents across a large number of complex environments. Specifically, our method discovers the RL rule by which the agent's policy and predictions are updated. In our large-scale experiments, the discovered rule surpassed all existing rules on the well-established Atari benchmark and outperformed a number of state-of-the-art RL algorithms on challenging benchmarks that it had not seen during discovery. Our findings suggest that the RL algorithms required for advanced artificial intelligence may soon be automatically discovered from the experiences of agents, rather than manually designed.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: A patent application(s) directed to aspects of the work described has been filed and is pending as of the date of manuscript submission. Google LLC has ownership and potential commercial interests in the work described.
Figures
References
-
- Kirsch, L., van Steenkiste, S. & Schmidhuber, J. Improving generalization in meta reinforcement learning using learned objectives. In Proc. International Conference on Learning Representations (ICLR, 2020).
-
- Kirsch, L. et al. Introducing symmetries to black box meta reinforcement learning. In Proc. AAAI Conference on Artificial Intelligence36, 7202–7210 (Association for the Advancement of Artificial Intelligence, 2022).
-
- Oh, J. et al. Discovering reinforcement learning algorithms. In Proc. Adv. Neural Inf. Process. Syst.33, 1060–1070 (NeurIPS, 2020).
-
- Xu, Z. et al. Meta-gradient reinforcement learning with an objective discovered online. In Proc. Adv. Neural Inf. Process. Syst.33, 15254–15264 (NeurIPS, 2020).
-
- Houthooft, R. et al. Evolved policy gradients. In Proc. Adv. Neural Inf. Process. Syst.31, 5405–5414 (NeurIPS, 2018).
MeSH terms
LinkOut - more resources
Full Text Sources
