Multiagent cooperation and competition with deep reinforcement learning

Ardi Tampuu¹, Tambet Matiisen¹, Dorian Kodelja¹, Ilya Kuzovkin¹, Kristjan Korjus¹, Juhan Aru², Jaan Aru¹, Raul Vicente¹

Affiliations

¹ Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia.
² Department of Mathematics, ETH Zürich, Zürich, Switzerland.

PMID: 28380078
PMCID: PMC5381785
DOI: 10.1371/journal.pone.0172395

Multiagent cooperation and competition with deep reinforcement learning

Ardi Tampuu et al. PLoS One. 2017.

. 2017 Apr 5;12(4):e0172395.

doi: 10.1371/journal.pone.0172395. eCollection 2017.

Authors

Ardi Tampuu¹, Tambet Matiisen¹, Dorian Kodelja¹, Ilya Kuzovkin¹, Kristjan Korjus¹, Juhan Aru², Jaan Aru¹, Raul Vicente¹

Affiliations

¹ Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia.
² Department of Mathematics, ETH Zürich, Zürich, Switzerland.

PMID: 28380078
PMCID: PMC5381785
DOI: 10.1371/journal.pone.0172395

Abstract

Evolution of cooperation and competition can appear when multiple adaptive agents share a biological, social, or technological niche. In the present work we study how cooperation and competition emerge between autonomous agents that learn by reinforcement while using only their raw visual input as the state representation. In particular, we extend the Deep Q-Learning framework to multiagent environments to investigate the interaction between two learning agents in the well-known video game Pong. By manipulating the classical rewarding scheme of Pong we show how competitive and collaborative behaviors emerge. We also describe the progression from competitive to collaborative behavior when the incentive to cooperate is increased. Finally we show how learning by playing against another adaptive agent, instead of against a hard-wired algorithm, results in more robust strategies. The present work shows that Deep Q-Networks can become a useful tool for studying decentralized learning of multiagent systems coping with high-dimensional environments.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: We gratefully acknowledge the support of NVIDIA Corporation with the donation of one GeForce GTX TITAN X GPU used for this research. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

**Fig 1. Evolution of the behavior of the competitive agents during training.**
(a) The number of paddle-bounces increases indicating that the players get better at catching the ball. (b) The frequency of the ball hitting the upper and lower walls decreases slowly with training. The first 10 epochs are omitted from the plot as very few paddle-bounces were made by the agents and the metric was very noisy. (c) Serving time decreases abruptly in early stages of training- the agents learn to put the ball back into play. Serving time is measured in frames.

**Fig 2. A competitive game—game situations and the Q-values predicted by the agents.**
A) The left player predicts that the right player will not reach the ball as it is rapidly moving upwards. B) A change in the direction of the ball causes the left player’s reward expectation to drop. C) Players understand that the ball will inevitably go out of the play. See supporting information for videos illustrating other game situations and the corresponding agents’ Q-values.

**Fig 3. Evolution of the behavior of the collaborative agents during training.**
(a) The number of paddle-bounces increases as the players get better at reaching the ball. (b) The frequency of the ball hitting the upper and lower walls decreases significantly with training. The first 10 epochs are omitted from the plot as very few paddle-bounces were made by the agents and the metric was very noisy. (c) Serving takes a long time—the agents learn to postpone putting the ball into play.

**Fig 4. Cooperative game—game situations and the Q-values predicted by the agents.**
A) The ball is moving slowly and the future reward expectation is not very low—the agents do not expect to miss the slow balls. B) The ball is moving faster and the reward expectation is much more negative—the agents expect to miss the ball in the near future. C) The ball is inevitably going out of play. Both agents’ reward expectations drop accordingly. See supporting information for videos illustrating other game situations and the corresponding agents’ Q-values.

**Fig 5. Progression of behavioral statistics when passing from competitive to collaborative rewarding scheme.**
Each blue dot corresponds to the average of one game. Red line depicts the average across games (also given in S2 Table). (a) The game lasts longer when the agents have a strong incentive to collaborate. (b) Forcing the agents to collaborate decreases the proportion of angled shots that bounce off the walls before reaching the opposite player. Notice the two aberrant values for ρ = −0.75 correspond to games where the agents never reach the collaborative strategy of keeping the ball alive by passing it horizontally. (c) Serving time decreases when agents receive stronger positive rewards for scoring.

**Fig 6. Results of games between multiplayer DQN, single-player DQN and four hand-coded algorithms.**
The values correspond to an average of 10 games with different random seeds. Score difference means the points scored by the agent mentioned first minus the points of the agent mentioned second. (a) Multi and Single DQN’s performance against each other and against HC_N=4 in function of training time. (b) Scores of Single DQN and Multi DQN agents against 4 versions of a handcoded agent trying to keep the center of the paddle level with the ball. N refers to the number of frames a selected action is repeated by the algorithm before selecting a new action (the smaller the better).

See this image and copyright information in PMC

References

1. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; Cambridge; 1998.
1. Poole DL, Mackworth AK. Artificial Intelligence: foundations of computational agents. Cambridge University Press; 2010.
1. Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on. 2008;38(2):156–172. 10.1109/TSMCC.2007.913919 - DOI
1. Sumpter DJ. Collective animal behavior. Princeton University Press; 2010.
1. Schwartz HM. Multi-Agent Machine Learning: A Reinforcement Approach. John Wiley & Sons; 2014.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiagent cooperation and competition with deep reinforcement learning

Affiliations

Multiagent cooperation and competition with deep reinforcement learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources