Optimistic sequential multi-agent reinforcement learning with motivational communication

Anqi Huang¹, Yongli Wang², Xiaoliang Zhou³, Haochen Zou³, Xu Dong³, Xun Che³

Affiliations

¹ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China. Electronic address: anqihuang@njust.edu.cn.
² School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China. Electronic address: yongliwang@njust.edu.cn.
³ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.

PMID: 39068677
DOI: 10.1016/j.neunet.2024.106547

Optimistic sequential multi-agent reinforcement learning with motivational communication

Anqi Huang et al. Neural Netw. 2024 Nov.

. 2024 Nov:179:106547.

doi: 10.1016/j.neunet.2024.106547. Epub 2024 Jul 22.

Authors

Anqi Huang¹, Yongli Wang², Xiaoliang Zhou³, Haochen Zou³, Xu Dong³, Xun Che³

Affiliations

¹ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China. Electronic address: anqihuang@njust.edu.cn.
² School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China. Electronic address: yongliwang@njust.edu.cn.
³ School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.

PMID: 39068677
DOI: 10.1016/j.neunet.2024.106547

Abstract

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called Optimistic Sequential Soft Actor Critic with Motivational Communication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.

Keywords: Motivational communication; Multi-agent reinforcement learning; Multi-agent system; Policy gradient; Reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimistic sequential multi-agent reinforcement learning with motivational communication

Affiliations

Optimistic sequential multi-agent reinforcement learning with motivational communication

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

LinkOut - more resources

Full Text Sources