. 2023 Sep 14;18(9):e0291545.

doi: 10.1371/journal.pone.0291545. eCollection 2023.

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Jung In Kim¹, Young Jae Lee¹, Jongkook Heo¹, Jinhyeok Park¹, Jaehoon Kim¹, Sae Rin Lim¹, Jinyong Jeong¹, Seoung Bum Kim¹

Affiliations

PMID: 37708154
PMCID: PMC10501567
DOI: 10.1371/journal.pone.0291545

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Jung In Kim et al. PLoS One. 2023.

. 2023 Sep 14;18(9):e0291545.

doi: 10.1371/journal.pone.0291545. eCollection 2023.

Authors

Jung In Kim¹, Young Jae Lee¹, Jongkook Heo¹, Jinhyeok Park¹, Jaehoon Kim¹, Sae Rin Lim¹, Jinyong Jeong¹, Seoung Bum Kim¹

Affiliation

¹ School of Industrial and Management Engineering, Korea University, Seoul, Republic of Korea.

PMID: 37708154
PMCID: PMC10501567
DOI: 10.1371/journal.pone.0291545

Abstract

Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency-a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems.

Copyright: © 2023 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overall architecture of the proposed method, which combines QMIX with a masked reconstruction task.**
Masked reconstruction task consists of a target and online network. The gray boxes represent the three recurrent networks.

**Fig 2**
(a) Overall framework of QMIX. The output values obtained from each agent network are monotonically mixed to generate a joint action value function. (b) Agent network architecture. The network takes the current observation and the last action of an individual agent as inputs and outputs the corresponding Q-value for each agent.

**Fig 3. Comparison between M-QMIX and QMIX on all super hard maps.**

**Fig 4. Comparison between M-QMIX and QMIX on all hard maps.**

**Fig 5. Comparison between M-QMIX and QMIX on all easy maps.**

**Fig 6. Performance of M-QMIX under different masking ratios.**

**Fig 7. Performance of M-QMIX under different momentum values.**

See this image and copyright information in PMC

References

1. CHüttenrauch, M., Šošić, A., Neumann, G. Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:170906011. 2017;. 10.48550/arXiv.1709.06011 - DOI
1. Cao Yongcan and Yu Wenwu and Ren Wei and Chen Guanrong. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics. 2012;9(1):427–438. https://ieeexplore.ieee.org/abstract/document/6303906 doi: 10.1109/TII.2012.2219061 - DOI
1. Lipowska D, Lipowski A. Emergence of linguistic conventions in multi-agent reinforcement learning. PLoS One. 2018;13(11):e0208095.438. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208095 - PMC - PubMed
1. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al.. Multiagent cooperation and competition with deep reinforcement learning. PloS one. 2017;12(4):e0172395. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0172395 - PMC - PubMed
1. Park YJ, Cho YS, Kim SB. Multiagent cooperation and competition with deep reinforcement learning. PloS one. 2019;14(9):e0222215. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0222215 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Affiliation

Sample-efficient multi-agent reinforcement learning with masked reconstruction

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources