Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov:179:106565.
doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.

MuDE: Multi-agent decomposed reward-based exploration

Affiliations
Free article

MuDE: Multi-agent decomposed reward-based exploration

Byunghyun Yoo et al. Neural Netw. 2024 Nov.
Free article

Abstract

In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.

Keywords: Exploration; Multi-agent reinforcement learning; Reward decomposition.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Byunghyun Yoo has patent pending to Electronics and Telecommunicatons esearch Insttute.

LinkOut - more resources