Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 20:5:805823.
doi: 10.3389/frai.2022.805823. eCollection 2022.

Investigation of independent reinforcement learning algorithms in multi-agent environments

Affiliations

Investigation of independent reinforcement learning algorithms in multi-agent environments

Ken Ming Lee et al. Front Artif Intell. .

Abstract

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on seven PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. For the cooperative setting, we show that independent algorithms can perform on par with multi-agent algorithms in fully-observable environments, while adding recurrence improves the learning of independent algorithms in partially-observable environments. In the competitive setting, independent algorithms can perform on par or better than multi-agent algorithms, even in more challenging environments. We also show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies in mixed environments.

Keywords: artificial intelligence; deep learning; machine learning; multi-agent reinforcement learning; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Training curves of various algorithms in two cooperative environments. For every algorithm, the solid line represents the mean reward per episode, while the shaded region represents the 95% confidence interval around the mean. (A) Shows training curve for Simple Reference environment, (B) shows training curve for Space Invaders environment.
Figure 2
Figure 2
Training curves of various algorithms in Space Invaders, comparing when individual rewards are given (blue) to when team rewards are given (orange). (A) Shows training curve of DQN, (B) shows training curve of MAPPO, (C) shows training curve of RMAPPO.
Figure 3
Figure 3
Performance of various algorithms when playing against other algorithms in the Boxing environment. (A) Shows the number of games won as the first player, (B) shows the number of games won as the second player, (C) shows the overall win rate percentage.
Figure 4
Figure 4
Performance of various algorithms when playing against other algorithms in Pong. (A) Shows the number of games won as the first player, (B) shows the number of games won as the second player, (C) shows the overall win rate percentage.
Figure 5
Figure 5
Performance of various algorithms when playing against other algorithms in the Space War environment. (A) Shows the number of games won as the first player, (B) shows the number of games won as the second player, (C) shows the overall win rate percentage.
Figure 6
Figure 6
Training curves of various algorithms in the Simple Tag, a Predator-Prey environment. (A) Shows the reward of a predator (all predators obtain the same reward), (B) shows the reward of the prey.
Figure 7
Figure 7
Training curves of various algorithms in the Simple Adversary environment. (A) Shows the reward of the adversary, (B) shows the reward of a cooperative agent (both cooperative agents obtain the same reward).
Figure 8
Figure 8
Comparing DQN with (blue) and without (orange) agent indicators in (A) Simple Reference and (B) Space Invaders environment.
Figure 9
Figure 9
Performance of various algorithms when playing against other algorithms in Pong without agent indicators across 3 seeds. (A) Shows the number of games won as the first player, (B) shows the number of games won as the second player, (C) shows the overall win rate percentage.

Similar articles

Cited by

References

    1. Andrychowicz M., Raichuk A., Stańczyk P., Orsini M., Girgin S., Marinier R., et al. . (2020). What matters in on-policy reinforcement learning? a large-scale empirical study. arXiv preprint arXiv:2006.05990. 10.48550/arXiv.2006.05990 - DOI
    1. Bellemare M. G., Naddaf Y., Veness J., Bowling M. (2013). The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279. 10.1613/jair.3912 - DOI
    1. Bellman R. (1957). A markovian decision process. J. Math. Mech. 6, 679–684. 10.1512/iumj.1957.6.56038 - DOI
    1. Berner C., Brockman G., Chan B., Cheung V., Debiak P., Dennison C., et al. . (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. 10.48550/arXiv.1912.06680 - DOI
    1. Busoniu L., Babuska R., De Schutter B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. C 38, 156–172. 10.1109/TSMCC.2007.913919 - DOI - PubMed