Review

. 2025 Jun 6;10(6):375.

doi: 10.3390/biomimetics10060375.

Multi-Agent Reinforcement Learning in Games: Research and Applications

Haiyang Li¹, Ping Yang¹, Weidong Liu¹, Shaoqiang Yan¹, Xinyi Zhang¹, Donglin Zhu²

Affiliations

¹ High-Tech Institute of Xi'an, Xi'an 710038, China.
² School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.

PMID: 40558344
PMCID: PMC12190516
DOI: 10.3390/biomimetics10060375

Review

Multi-Agent Reinforcement Learning in Games: Research and Applications

Haiyang Li et al. Biomimetics (Basel). 2025.

. 2025 Jun 6;10(6):375.

doi: 10.3390/biomimetics10060375.

Authors

Haiyang Li¹, Ping Yang¹, Weidong Liu¹, Shaoqiang Yan¹, Xinyi Zhang¹, Donglin Zhu²

Affiliations

¹ High-Tech Institute of Xi'an, Xi'an 710038, China.
² School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.

PMID: 40558344
PMCID: PMC12190516
DOI: 10.3390/biomimetics10060375

Abstract

Biological systems, ranging from ant colonies to neural ecosystems, exhibit remarkable self-organizing intelligence. Inspired by these phenomena, this study investigates how bio-inspired computing principles can bridge game-theoretic rationality and multi-agent adaptability. This study systematically reviews the convergence of multi-agent reinforcement learning (MARL) and game theory, elucidating the innovative potential of this integrated paradigm for collective intelligent decision-making in dynamic open environments. Building upon stochastic game and extensive-form game-theoretic frameworks, we establish a methodological taxonomy across three dimensions: value function optimization, policy gradient learning, and online search planning, thereby clarifying the evolutionary logic and innovation trajectories of algorithmic advancements. Focusing on complex smart city scenarios-including intelligent transportation coordination and UAV swarm scheduling-we identify technical breakthroughs in MARL applications for policy space modeling and distributed decision optimization. By incorporating bio-inspired optimization approaches, the investigation particularly highlights evolutionary computation mechanisms for dynamic strategy generation in search planning, alongside population-based learning paradigms for enhancing exploration efficiency in policy refinement. The findings reveal core principles governing how groups make optimal choices in complex environments while mapping the technological development pathways created by blending cross-disciplinary methods to enhance multi-agent systems.

Keywords: evolutionary computation; game theory; multi-agent reinforcement learning; stochastic games.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Smart city application scenarios.

**Figure 2**
Modeling Markov Decision Process interactions between RL agents and the environment.

**Figure 3**
Joint action space optimization and state transfer mechanism in multi-agent stochastic games.

**Figure 4**
Multi-stage decision tree modeling under incomplete information in extended games.

**Figure 5**
Monte Carlo Tree Search algorithm flow and equilibrium approximation.

**Figure 6**
Dynamic planning value iteration illustration.

**Figure 7**
Comparative study of parameter optimization paradigms in reinforcement learning. (a) Decoupled parallel optimization and (b) population-based policy co-evolution. Different-colored bars represent distinct policy individuals in both subfigures.

**Figure 8**
Monte Carlo Tree Search process.

**Figure 9**
Deconstruction of rolling time–domain evolutionary algorithmic processes for real-time adversarial decision-making.

See this image and copyright information in PMC

References

1. Che A., Wang Z., Zhou C. Multi-Agent Deep Reinforcement Learning for Recharging-Considered Vehicle Scheduling Problem in Container Terminals. IEEE Trans. Intell. Transp. Syst. 2024;25:16855–16868. doi: 10.1109/TITS.2024.3412932. - DOI
1. Wang K., Shen Z., Lei Z., Liu X., Zhang T. IEEE Transactions on Mobile Computing. IEEE; Piscataway, NJ, USA: 2024. Towards Multi-agent Reinforcement Learning based Traffic Signal Control through Spatio-temporal Hypergraphs; pp. 1–14. - DOI
1. Zhang L., Yang C., Yan Y., Hu Y. Distributed real-time scheduling in cloud manufacturing by deep reinforcement learning. IEEE Trans. Ind. Inform. 2022;18:8999–9007. doi: 10.1109/TII.2022.3178410. - DOI
1. Xiong K., Wei Q., Liu Y. Community Microgrid Energy Co-Scheduling Based on Deep Reinforcement Learning and Contribution Mechanisms. IEEE Trans. Smart Grid. 2025;16:1051–1061. doi: 10.1109/TSG.2024.3461320. - DOI
1. Xiong W., Guo L., Jiao T. A multi-agent path planning algorithm based on game theory and reinforcement learning. Shenzhen Daxue Xuebao (Ligong Ban)/J. Shenzhen Univ. Sci. Eng. 2024;41:274–282. doi: 10.3724/SP.J.1249.2024.03274. - DOI

Publication types

Actions

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-Agent Reinforcement Learning in Games: Research and Applications

Affiliations

Multi-Agent Reinforcement Learning in Games: Research and Applications

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

LinkOut - more resources

Full Text Sources