Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jun 6;10(6):375.
doi: 10.3390/biomimetics10060375.

Multi-Agent Reinforcement Learning in Games: Research and Applications

Affiliations
Review

Multi-Agent Reinforcement Learning in Games: Research and Applications

Haiyang Li et al. Biomimetics (Basel). .

Abstract

Biological systems, ranging from ant colonies to neural ecosystems, exhibit remarkable self-organizing intelligence. Inspired by these phenomena, this study investigates how bio-inspired computing principles can bridge game-theoretic rationality and multi-agent adaptability. This study systematically reviews the convergence of multi-agent reinforcement learning (MARL) and game theory, elucidating the innovative potential of this integrated paradigm for collective intelligent decision-making in dynamic open environments. Building upon stochastic game and extensive-form game-theoretic frameworks, we establish a methodological taxonomy across three dimensions: value function optimization, policy gradient learning, and online search planning, thereby clarifying the evolutionary logic and innovation trajectories of algorithmic advancements. Focusing on complex smart city scenarios-including intelligent transportation coordination and UAV swarm scheduling-we identify technical breakthroughs in MARL applications for policy space modeling and distributed decision optimization. By incorporating bio-inspired optimization approaches, the investigation particularly highlights evolutionary computation mechanisms for dynamic strategy generation in search planning, alongside population-based learning paradigms for enhancing exploration efficiency in policy refinement. The findings reveal core principles governing how groups make optimal choices in complex environments while mapping the technological development pathways created by blending cross-disciplinary methods to enhance multi-agent systems.

Keywords: evolutionary computation; game theory; multi-agent reinforcement learning; stochastic games.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Smart city application scenarios.
Figure 2
Figure 2
Modeling Markov Decision Process interactions between RL agents and the environment.
Figure 3
Figure 3
Joint action space optimization and state transfer mechanism in multi-agent stochastic games.
Figure 4
Figure 4
Multi-stage decision tree modeling under incomplete information in extended games.
Figure 5
Figure 5
Monte Carlo Tree Search algorithm flow and equilibrium approximation.
Figure 6
Figure 6
Dynamic planning value iteration illustration.
Figure 7
Figure 7
Comparative study of parameter optimization paradigms in reinforcement learning. (a) Decoupled parallel optimization and (b) population-based policy co-evolution. Different-colored bars represent distinct policy individuals in both subfigures.
Figure 8
Figure 8
Monte Carlo Tree Search process.
Figure 9
Figure 9
Deconstruction of rolling time–domain evolutionary algorithmic processes for real-time adversarial decision-making.

Similar articles

References

    1. Che A., Wang Z., Zhou C. Multi-Agent Deep Reinforcement Learning for Recharging-Considered Vehicle Scheduling Problem in Container Terminals. IEEE Trans. Intell. Transp. Syst. 2024;25:16855–16868. doi: 10.1109/TITS.2024.3412932. - DOI
    1. Wang K., Shen Z., Lei Z., Liu X., Zhang T. IEEE Transactions on Mobile Computing. IEEE; Piscataway, NJ, USA: 2024. Towards Multi-agent Reinforcement Learning based Traffic Signal Control through Spatio-temporal Hypergraphs; pp. 1–14. - DOI
    1. Zhang L., Yang C., Yan Y., Hu Y. Distributed real-time scheduling in cloud manufacturing by deep reinforcement learning. IEEE Trans. Ind. Inform. 2022;18:8999–9007. doi: 10.1109/TII.2022.3178410. - DOI
    1. Xiong K., Wei Q., Liu Y. Community Microgrid Energy Co-Scheduling Based on Deep Reinforcement Learning and Contribution Mechanisms. IEEE Trans. Smart Grid. 2025;16:1051–1061. doi: 10.1109/TSG.2024.3461320. - DOI
    1. Xiong W., Guo L., Jiao T. A multi-agent path planning algorithm based on game theory and reinforcement learning. Shenzhen Daxue Xuebao (Ligong Ban)/J. Shenzhen Univ. Sci. Eng. 2024;41:274–282. doi: 10.3724/SP.J.1249.2024.03274. - DOI

LinkOut - more resources