Investigation of independent reinforcement learning algorithms in multi-agent environments

doi:10.3389/frai.2022.805823

. 2022 Sep 20:5:805823.

doi: 10.3389/frai.2022.805823. eCollection 2022.

Investigation of independent reinforcement learning algorithms in multi-agent environments

Ken Ming Lee¹, Sriram Ganapathi Subramanian¹, Mark Crowley¹

Affiliations

PMID: 36204598
PMCID: PMC9530713
DOI: 10.3389/frai.2022.805823

Investigation of independent reinforcement learning algorithms in multi-agent environments

Ken Ming Lee et al. Front Artif Intell. 2022.

. 2022 Sep 20:5:805823.

doi: 10.3389/frai.2022.805823. eCollection 2022.

Authors

Ken Ming Lee¹, Sriram Ganapathi Subramanian¹, Mark Crowley¹

Affiliation

¹ Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada.

PMID: 36204598
PMCID: PMC9530713
DOI: 10.3389/frai.2022.805823

Abstract

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on seven PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. For the cooperative setting, we show that independent algorithms can perform on par with multi-agent algorithms in fully-observable environments, while adding recurrence improves the learning of independent algorithms in partially-observable environments. In the competitive setting, independent algorithms can perform on par or better than multi-agent algorithms, even in more challenging environments. We also show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies in mixed environments.

Keywords: artificial intelligence; deep learning; machine learning; multi-agent reinforcement learning; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Training curves of various algorithms in two cooperative environments. For every algorithm, the solid line represents the mean reward per episode, while the shaded region represents the 95% confidence interval around the mean. **(A)** Shows training curve for Simple Reference environment, **(B)** shows training curve for Space Invaders environment.

**Figure 2**
Training curves of various algorithms in Space Invaders, comparing when individual rewards are given (blue) to when team rewards are given (orange). **(A)** Shows training curve of DQN, **(B)** shows training curve of MAPPO, **(C)** shows training curve of RMAPPO.

**Figure 3**
Performance of various algorithms when playing against other algorithms in the Boxing environment. **(A)** Shows the number of games won as the first player, **(B)** shows the number of games won as the second player, **(C)** shows the overall win rate percentage.

**Figure 4**
Performance of various algorithms when playing against other algorithms in Pong. **(A)** Shows the number of games won as the first player, **(B)** shows the number of games won as the second player, **(C)** shows the overall win rate percentage.

**Figure 5**
Performance of various algorithms when playing against other algorithms in the Space War environment. **(A)** Shows the number of games won as the first player, **(B)** shows the number of games won as the second player, **(C)** shows the overall win rate percentage.

**Figure 6**
Training curves of various algorithms in the Simple Tag, a Predator-Prey environment. **(A)** Shows the reward of a predator (all predators obtain the same reward), **(B)** shows the reward of the prey.

**Figure 7**
Training curves of various algorithms in the Simple Adversary environment. **(A)** Shows the reward of the adversary, **(B)** shows the reward of a cooperative agent (both cooperative agents obtain the same reward).

**Figure 8**
Comparing DQN with (blue) and without (orange) agent indicators in **(A)** Simple Reference and **(B)** Space Invaders environment.

**Figure 9**
Performance of various algorithms when playing against other algorithms in Pong without agent indicators across 3 seeds. **(A)** Shows the number of games won as the first player, **(B)** shows the number of games won as the second player, **(C)** shows the overall win rate percentage.

See this image and copyright information in PMC

Cited by

Multi-Agent Reinforcement Learning in Games: Research and Applications.
Li H, Yang P, Liu W, Yan S, Zhang X, Zhu D. Li H, et al. Biomimetics (Basel). 2025 Jun 6;10(6):375. doi: 10.3390/biomimetics10060375. Biomimetics (Basel). 2025. PMID: 40558344 Free PMC article. Review.

References

1. Andrychowicz M., Raichuk A., Stańczyk P., Orsini M., Girgin S., Marinier R., et al. . (2020). What matters in on-policy reinforcement learning? a large-scale empirical study. arXiv preprint arXiv:2006.05990. 10.48550/arXiv.2006.05990 - DOI
1. Bellemare M. G., Naddaf Y., Veness J., Bowling M. (2013). The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279. 10.1613/jair.3912 - DOI
1. Bellman R. (1957). A markovian decision process. J. Math. Mech. 6, 679–684. 10.1512/iumj.1957.6.56038 - DOI
1. Berner C., Brockman G., Chan B., Cheung V., Debiak P., Dennison C., et al. . (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. 10.48550/arXiv.1912.06680 - DOI
1. Busoniu L., Babuska R., De Schutter B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. C 38, 156–172. 10.1109/TSMCC.2007.913919 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Andrychowicz M., Raichuk A., Stańczyk P., Orsini M., Girgin S., Marinier R., et al. . (2020). What matters in on-policy reinforcement learning? a large-scale empirical study. arXiv preprint arXiv:2006.05990. 10.48550/arXiv.2006.05990 - DOI

[2] Andrychowicz M., Raichuk A., Stańczyk P., Orsini M., Girgin S., Marinier R., et al. . (2020). What matters in on-policy reinforcement learning? a large-scale empirical study. arXiv preprint arXiv:2006.05990. 10.48550/arXiv.2006.05990 - DOI

[3] Bellemare M. G., Naddaf Y., Veness J., Bowling M. (2013). The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279. 10.1613/jair.3912 - DOI

[4] Bellemare M. G., Naddaf Y., Veness J., Bowling M. (2013). The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279. 10.1613/jair.3912 - DOI

[5] Bellman R. (1957). A markovian decision process. J. Math. Mech. 6, 679–684. 10.1512/iumj.1957.6.56038 - DOI

[6] Bellman R. (1957). A markovian decision process. J. Math. Mech. 6, 679–684. 10.1512/iumj.1957.6.56038 - DOI

[7] Berner C., Brockman G., Chan B., Cheung V., Debiak P., Dennison C., et al. . (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. 10.48550/arXiv.1912.06680 - DOI

[8] Berner C., Brockman G., Chan B., Cheung V., Debiak P., Dennison C., et al. . (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. 10.48550/arXiv.1912.06680 - DOI

[9] Busoniu L., Babuska R., De Schutter B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. C 38, 156–172. 10.1109/TSMCC.2007.913919 - DOI - PubMed

[10] Busoniu L., Babuska R., De Schutter B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. C 38, 156–172. 10.1109/TSMCC.2007.913919 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Investigation of independent reinforcement learning algorithms in multi-agent environments

Affiliation

Investigation of independent reinforcement learning algorithms in multi-agent environments

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Research Materials