Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 19;21(3):294.
doi: 10.3390/e21030294.

Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration

Affiliations

Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration

Mingyang Geng et al. Entropy (Basel). .

Abstract

In a decentralized multi-robot exploration problem, the robots have to cooperate effectively to map a strange environment as soon as possible without a centralized controller. In the past few decades, a set of "human-designed" cooperation strategies have been proposed to address this problem, such as the well-known frontier-based approach. However, many real-world settings, especially the ones that are constantly changing, are too complex for humans to design efficient and decentralized strategies. This paper presents a novel approach, the Attention-based Communication neural network (CommAttn), to "learn" the cooperation strategies automatically in the decentralized multi-robot exploration problem. The communication neural network enables the robots to learn the cooperation strategies with explicit communication. Moreover, the attention mechanism we introduced additionally can precisely calculate whether the communication is necessary for each pair of agents by considering the relevance of each received message, which enables the robots to communicate only with the necessary partners. The empirical results on a simulated multi-robot disaster exploration scenario demonstrate that our proposal outperforms the traditional "human-designed" methods, as well as other competing "learning-based" methods in the exploration task.

Keywords: attention mechanism; deep reinforcement learning; dynamic environments; multi-robot exploration.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A six-robot exploration scenario; the signals stand for the correct communication method: Robot 1 should communicate with Robot 0 to avoid repeated exploration; Robot 1 should also communicate with Robot 3 to avoid target area conflict; Robot 2, Robot 4, and Robot 5 should communicate with the others less to avoid interference.
Figure 2
Figure 2
The architecture of the Attention-based Communication neural network (CommAttn): each agent’s input is composed of two parts: the local observation (the environment knowledge within the agent’s vision range) and the trajectory (a set of the agent’s history positions).
Figure 3
Figure 3
The experimental environment, which is dynamic in the number of blocks.
Figure 4
Figure 4
The exploration rate in 30 s of a different number of robots.
Figure 5
Figure 5
As the adding frequency of the blocks increases, CommAttn shows a more stable exploration efficiency than the baseline “pre-designed” methods.
Figure 6
Figure 6
An illustrative scenario (the initial environment and the corresponding actions for all agents) to show how CommAttn successfully deals with dynamic environments (newly-introduced blocks).
Figure 7
Figure 7
The optimal actions of CommAttn and the sub-optimal actions of the coordinated frontier-based approach after some unexpected blocks introduced to the environment.
Figure 8
Figure 8
The variation of the agents’ summed scores in the training process between CommAttn and the baseline “learning” methods.
Figure 9
Figure 9
The relationship among the success rate, the vision range, and the communication range of CommNet.
Figure 10
Figure 10
The values of the hidden state sj of each agent from the decoder part in the static exploration environment.
Figure 11
Figure 11
The average norm of communication vectors in the static environment.

Similar articles

Cited by

References

    1. Juliá M., Gil A., Reinoso O. A comparison of path planning strategies for autonomous exploration and mapping of unknown environments. Auton. Robot. 2012;33:427–444. doi: 10.1007/s10514-012-9298-8. - DOI
    1. Arai T., Pagello E., Parker L.E. Advances in multi-robot systems. IEEE Trans. Robot. Autom. 2002;18:655–661. doi: 10.1109/TRA.2002.806024. - DOI
    1. Peng Z., Zhang L., Luo T. Learning to Communicate via Supervised Attentional Message Processing; Proceedings of the 31st International Conference on Computer Animation and Social Agents; Beijing, China. 21–23 May 2018; New York, NY, USA: ACM; 2018. pp. 11–16.
    1. Khamis A., Hussein A., Elmogy A. Cooperative Robots and Sensor Networks 2015. Springer; Cham, Switzerland: 2015. Multi-robot task allocation: A review of the state-of-the-art; pp. 31–51.
    1. Geng M., Li Y., Ding B., Wang H. Deep Learning-based Cooperative Trail Following for Multi-Robot System; Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN); Rio de Janeiro, Brazil. 8–13 July 2018; pp. 1–8.

LinkOut - more resources