MuDE: Multi-agent decomposed reward-based exploration
- PMID: 39111159
- DOI: 10.1016/j.neunet.2024.106565
MuDE: Multi-agent decomposed reward-based exploration
Abstract
In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.
Keywords: Exploration; Multi-agent reinforcement learning; Reward decomposition.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Byunghyun Yoo has patent pending to Electronics and Telecommunicatons esearch Insttute.
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous