MuDE: Multi-agent decomposed reward-based exploration

Byunghyun Yoo¹, Sungwon Yi², Hyunwoo Kim², Younghwan Shin², Ran Han², Seungwoo Seo², Hwa Jeon Song², Euisok Chung², Jeongmin Yang²

Affiliations

¹ Electronics and Telecommunications Research Institute (ETRI), 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, South Korea. Electronic address: bhyoo@etri.re.kr.
² Electronics and Telecommunications Research Institute (ETRI), 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, South Korea.

PMID: 39111159
DOI: 10.1016/j.neunet.2024.106565

Free article

MuDE: Multi-agent decomposed reward-based exploration

Byunghyun Yoo et al. Neural Netw. 2024 Nov.

Free article

. 2024 Nov:179:106565.

doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.

Authors

Byunghyun Yoo¹, Sungwon Yi², Hyunwoo Kim², Younghwan Shin², Ran Han², Seungwoo Seo², Hwa Jeon Song², Euisok Chung², Jeongmin Yang²

Affiliations

¹ Electronics and Telecommunications Research Institute (ETRI), 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, South Korea. Electronic address: bhyoo@etri.re.kr.
² Electronics and Telecommunications Research Institute (ETRI), 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, South Korea.

PMID: 39111159
DOI: 10.1016/j.neunet.2024.106565

Abstract

In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.

Keywords: Exploration; Multi-agent reinforcement learning; Reward decomposition.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Byunghyun Yoo has patent pending to Electronics and Telecommunicatons esearch Insttute.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MuDE: Multi-agent decomposed reward-based exploration

Affiliations

MuDE: Multi-agent decomposed reward-based exploration

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous