Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

doi:10.3390/e23091133

. 2021 Aug 31;23(9):1133.

doi: 10.3390/e23091133.

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Shanzhi Gu¹, Mingyang Geng¹, Long Lan²

Affiliations

¹ College of Computer, National University of Defense Technology, Changsha 410073, China.
² High Performance Computing Laboratory, National University of Defense Technology, Changsha 410073, China.

PMID: 34573757
PMCID: PMC8469175
DOI: 10.3390/e23091133

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Shanzhi Gu et al. Entropy (Basel). 2021.

. 2021 Aug 31;23(9):1133.

doi: 10.3390/e23091133.

Authors

Shanzhi Gu¹, Mingyang Geng¹, Long Lan²

Affiliations

¹ College of Computer, National University of Defense Technology, Changsha 410073, China.
² High Performance Computing Laboratory, National University of Defense Technology, Changsha 410073, China.

PMID: 34573757
PMCID: PMC8469175
DOI: 10.3390/e23091133

Abstract

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.

Keywords: attention mechanism; fault tolerance; multi-agent; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 5**
Cross-comparison between FT-Attn and the baseline methods in terms of the predator score in the different versions of the modified predator and prey scenario.

**Figure 1**
An illustration of our scenario: the modified predator and prey problem. Three slower predators learn to cooperate to capture a faster prey with obstacles impeding the way. However, when both Predator 2 and Predator 3 obtain the wrong relative position of the prey, the learning process will become extremely difficult since they must learn to trust Predator 1 along with learning the action policies. All the agents are not aware of whether they have faulted or not.

**Figure 2**
FT-Attn is composed of three modules: encoder, multihead attention-based information filtering part for fault tolerance, and Q-network.

**Figure 3**
(**Left**) an illustration of the modified cooperative navigation problem: the gifted agent (red circle) can correctly observe all three landmarks (grey squares); the other agents (blue and green circles) receive the wrong locations of landmarks. (**Right**) an illustration of the modified predator and prey problem: the gifted predator (red circle) can correctly observe the position of the prey, while the other two predators receive the wrong location of the prey.

**Figure 4**
(**Left**) learning curves for all models in the alternating version of the modified cooperative navigation scenario. (**Right**) learning curves for all models in the dynamic version of the modified cooperative navigation scenario.

**Figure 6**
Attention entropy for each head over the course of training for the three agents in the “dynamic” situation of the modified cooperative navigation scenario. From (**left**) to (**right**): attention entropy of Agent 1, Agent 2, and Agent 3.

**Figure 7**
Learning curves of FT-Attn in the dynamic version of the modified cooperative navigation scenario with different numbers of attention heads.

**Figure 8**
Attention weights generated by FT-Attn in the fixed case of the modified cooperative navigation scenario when N is set to 5. Scenario 1 to Scenario 4 are listed from **left** to **right**. Scenario 1: only the observation of Agent 1 is correct; Scenario 2: the observations of Agent 2, and Agent 3 are correct; Scenario 3: the observations of Agent 1, Agent 2, and Agent 4 are correct; Scenario 4: the observations of Agent 1, Agent 2, Agent 3, and Agent 4 are correct.

See this image and copyright information in PMC

Cited by

An Improved Approach towards Multi-Agent Pursuit-Evasion Game Decision-Making Using Deep Reinforcement Learning.
Wan K, Wu D, Zhai Y, Li B, Gao X, Hu Z. Wan K, et al. Entropy (Basel). 2021 Oct 29;23(11):1433. doi: 10.3390/e23111433. Entropy (Basel). 2021. PMID: 34828131 Free PMC article.

References

1. Geng M., Zhou X., Ding B., Wang H., Zhang L. International Conference on Neural Information Processing. Springer; Cham, Switzerland: 2018. Learning to cooperate in decentralized multirobot exploration of dynamic environments; pp. 40–51.
1. Higgins F., Tomlinson A., Martin K.M. Survey on security challenges for swarm robotics; Proceedings of the 2009 Fifth International Conference on Autonomic and Autonomous Systems; Valencia, Spain. 20–25 April 2009; pp. 307–312.
1. Dresner K., Stone P. A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 2008;31:591–656. doi: 10.1613/jair.2502. - DOI
1. Pipattanasomporn M., Feroze H., Rahman S. Multi-agent systems in a distributed smart grid: Design and implementation; Proceedings of the 2009 IEEE/PES Power Systems Conference and Exposition; Seattle, WA, USA. 15–18 March 2009; pp. 1–8.
1. Geng M., Xu K., Zhou X., Ding B., Wang H., Zhang L. Learning to cooperate via an attention-based communication neural network in decentralized multirobot exploration. Entropy. 2019;21:294. doi: 10.3390/e21030294. - DOI - PMC - PubMed

Grants and funding

No. 61906210,Grant No. 2020AAA0103501/the National Natural Science Foundation of China (No. 61906210),381National Grand R&D Plan (Grant No. 2020AAA0103501)

LinkOut - more resources

Full Text Sources

[1] Geng M., Zhou X., Ding B., Wang H., Zhang L. International Conference on Neural Information Processing. Springer; Cham, Switzerland: 2018. Learning to cooperate in decentralized multirobot exploration of dynamic environments; pp. 40–51.

[2] Geng M., Zhou X., Ding B., Wang H., Zhang L. International Conference on Neural Information Processing. Springer; Cham, Switzerland: 2018. Learning to cooperate in decentralized multirobot exploration of dynamic environments; pp. 40–51.

[3] Higgins F., Tomlinson A., Martin K.M. Survey on security challenges for swarm robotics; Proceedings of the 2009 Fifth International Conference on Autonomic and Autonomous Systems; Valencia, Spain. 20–25 April 2009; pp. 307–312.

[4] Higgins F., Tomlinson A., Martin K.M. Survey on security challenges for swarm robotics; Proceedings of the 2009 Fifth International Conference on Autonomic and Autonomous Systems; Valencia, Spain. 20–25 April 2009; pp. 307–312.

[5] Dresner K., Stone P. A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 2008;31:591–656. doi: 10.1613/jair.2502. - DOI

[6] Dresner K., Stone P. A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 2008;31:591–656. doi: 10.1613/jair.2502. - DOI

[7] Pipattanasomporn M., Feroze H., Rahman S. Multi-agent systems in a distributed smart grid: Design and implementation; Proceedings of the 2009 IEEE/PES Power Systems Conference and Exposition; Seattle, WA, USA. 15–18 March 2009; pp. 1–8.

[8] Pipattanasomporn M., Feroze H., Rahman S. Multi-agent systems in a distributed smart grid: Design and implementation; Proceedings of the 2009 IEEE/PES Power Systems Conference and Exposition; Seattle, WA, USA. 15–18 March 2009; pp. 1–8.

[9] Geng M., Xu K., Zhou X., Ding B., Wang H., Zhang L. Learning to cooperate via an attention-based communication neural network in decentralized multirobot exploration. Entropy. 2019;21:294. doi: 10.3390/e21030294. - DOI - PMC - PubMed

[10] Geng M., Xu K., Zhou X., Ding B., Wang H., Zhang L. Learning to cooperate via an attention-based communication neural network in decentralized multirobot exploration. Entropy. 2019;21:294. doi: 10.3390/e21030294. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Affiliations

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources