. 2019 Sep 11;14(9):e0222215.

doi: 10.1371/journal.pone.0222215. eCollection 2019.

Multi-agent reinforcement learning with approximate model learning for competitive games

Young Joon Park¹, Yoon Sang Cho¹, Seoung Bum Kim¹

Affiliations

PMID: 31509568
PMCID: PMC6739057
DOI: 10.1371/journal.pone.0222215

Multi-agent reinforcement learning with approximate model learning for competitive games

Young Joon Park et al. PLoS One. 2019.

. 2019 Sep 11;14(9):e0222215.

doi: 10.1371/journal.pone.0222215. eCollection 2019.

Authors

Young Joon Park¹, Yoon Sang Cho¹, Seoung Bum Kim¹

Affiliation

¹ School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea.

PMID: 31509568
PMCID: PMC6739057
DOI: 10.1371/journal.pone.0222215

Abstract

We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of the proposed CTRL when two adversarial teams exist.**
Each team is trained independently.

**Fig 2. Architectures of the CTRL with an AMLAPN.**
In the training phase, the observations are sequentially processed by the actor and critic along the arrows. The gradient signals are propagated in the reverse direction of the arrows. The shaded regions represent the auxiliary prediction networks for the approximate model learning. The number of units are shown within parentheses.

**Fig 3**
**Illustrations of the experimental environment for four scenarios:** physical deception (*top left*), keep-away (*top right*), predator-prey (*bottom left*), and complex predator-prey (*bottom right*).

**Fig 4**
**Learning curves for the four competitive scenarios:** episode rewards in physical deception (*top left*), episode rewards in keep away (*top right*), episode rewards in predator-prey (*bottom left*), and episode rewards in complex predator-prey (*bottom right*) scenarios. Each bar cluster represents the converged episode reward at the end of training. The shading region is a 95% confidence interval across the different random seeds.

**Fig 5**
**Relative performances in round-robin tournament evaluations:** the performances of team A trained by four methods (a), and the performances of team B trained by four methods (b). Each bar cluster shows the score for a set of competing policies; a higher score is better for the agent.

**Fig 6**
**Learning curves in the partial observation environments:** episode rewards in predator-prey (*left*) and episode rewards in complex predator-prey (*right*) scenarios. Each bar cluster represents the converged episode reward at the end of training. The shading region is a 95% confidence interval across the different random seeds.

**Fig 7**
Average relative performances with partial observation in round-robin tournament evaluations: performances of team A trained by four methods (a) and performances of team B trained by four methods (b). Each bar cluster shows the score for a set of competing policies; a higher score is better for the agent.

See this image and copyright information in PMC

Cited by

Sample-efficient multi-agent reinforcement learning with masked reconstruction.
Kim JI, Lee YJ, Heo J, Park J, Kim J, Lim SR, Jeong J, Kim SB. Kim JI, et al. PLoS One. 2023 Sep 14;18(9):e0291545. doi: 10.1371/journal.pone.0291545. eCollection 2023. PLoS One. 2023. PMID: 37708154 Free PMC article.
Image Classification Method Based on Multi-Agent Reinforcement Learning for Defects Detection for Casting.
Liu C, Zhang Y, Mao S. Liu C, et al. Sensors (Basel). 2022 Jul 8;22(14):5143. doi: 10.3390/s22145143. Sensors (Basel). 2022. PMID: 35890824 Free PMC article.

References

1. Cao Y., Yu W., Ren W., & Chen G. (2013). An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics, 9(1), 427–438. 10.1109/TII.2012.2219061 - DOI
1. Ye D., Zhang M., & Yang Y. (2015). A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors (Basel, Switzerland), 15(5), 10026–10047. 10.3390/s150510026 - DOI - PMC - PubMed
1. Ying W., & Dayong S. (2005). Multi-agent framework for third party logistics in E-commerce. Expert Systems with Applications, 29(2), 431–436. 10.1016/j.eswa.2005.04.039 - DOI
1. Matarić M. J. (1997). Reinforcement Learning in the Multi-Robot Domain. Autonomous Robots, 4(1), 73–83. 10.1023/A:1008819414322 - DOI
1. Jaderberg M., Czarnecki W. M., Dunning I., Marris L., Lever G., Castaneda A. G., et al. (2018). Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. Retrieved from https://arxiv.org/abs/1807.01281v1 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-agent reinforcement learning with approximate model learning for competitive games

Affiliation

Multi-agent reinforcement learning with approximate model learning for competitive games

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources