Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 31;23(9):1133.
doi: 10.3390/e23091133.

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Affiliations

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Shanzhi Gu et al. Entropy (Basel). .

Abstract

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.

Keywords: attention mechanism; fault tolerance; multi-agent; reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 5
Figure 5
Cross-comparison between FT-Attn and the baseline methods in terms of the predator score in the different versions of the modified predator and prey scenario.
Figure 1
Figure 1
An illustration of our scenario: the modified predator and prey problem. Three slower predators learn to cooperate to capture a faster prey with obstacles impeding the way. However, when both Predator 2 and Predator 3 obtain the wrong relative position of the prey, the learning process will become extremely difficult since they must learn to trust Predator 1 along with learning the action policies. All the agents are not aware of whether they have faulted or not.
Figure 2
Figure 2
FT-Attn is composed of three modules: encoder, multihead attention-based information filtering part for fault tolerance, and Q-network.
Figure 3
Figure 3
(Left) an illustration of the modified cooperative navigation problem: the gifted agent (red circle) can correctly observe all three landmarks (grey squares); the other agents (blue and green circles) receive the wrong locations of landmarks. (Right) an illustration of the modified predator and prey problem: the gifted predator (red circle) can correctly observe the position of the prey, while the other two predators receive the wrong location of the prey.
Figure 4
Figure 4
(Left) learning curves for all models in the alternating version of the modified cooperative navigation scenario. (Right) learning curves for all models in the dynamic version of the modified cooperative navigation scenario.
Figure 6
Figure 6
Attention entropy for each head over the course of training for the three agents in the “dynamic” situation of the modified cooperative navigation scenario. From (left) to (right): attention entropy of Agent 1, Agent 2, and Agent 3.
Figure 7
Figure 7
Learning curves of FT-Attn in the dynamic version of the modified cooperative navigation scenario with different numbers of attention heads.
Figure 8
Figure 8
Attention weights generated by FT-Attn in the fixed case of the modified cooperative navigation scenario when N is set to 5. Scenario 1 to Scenario 4 are listed from left to right. Scenario 1: only the observation of Agent 1 is correct; Scenario 2: the observations of Agent 2, and Agent 3 are correct; Scenario 3: the observations of Agent 1, Agent 2, and Agent 4 are correct; Scenario 4: the observations of Agent 1, Agent 2, Agent 3, and Agent 4 are correct.

Similar articles

Cited by

References

    1. Geng M., Zhou X., Ding B., Wang H., Zhang L. International Conference on Neural Information Processing. Springer; Cham, Switzerland: 2018. Learning to cooperate in decentralized multirobot exploration of dynamic environments; pp. 40–51.
    1. Higgins F., Tomlinson A., Martin K.M. Survey on security challenges for swarm robotics; Proceedings of the 2009 Fifth International Conference on Autonomic and Autonomous Systems; Valencia, Spain. 20–25 April 2009; pp. 307–312.
    1. Dresner K., Stone P. A multiagent approach to autonomous intersection management. J. Artif. Intell. Res. 2008;31:591–656. doi: 10.1613/jair.2502. - DOI
    1. Pipattanasomporn M., Feroze H., Rahman S. Multi-agent systems in a distributed smart grid: Design and implementation; Proceedings of the 2009 IEEE/PES Power Systems Conference and Exposition; Seattle, WA, USA. 15–18 March 2009; pp. 1–8.
    1. Geng M., Xu K., Zhou X., Ding B., Wang H., Zhang L. Learning to cooperate via an attention-based communication neural network in decentralized multirobot exploration. Entropy. 2019;21:294. doi: 10.3390/e21030294. - DOI - PMC - PubMed

LinkOut - more resources