Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Feb 26;23(3):283.
doi: 10.3390/e23030283.

Attention Mechanisms and Their Applications to Complex Systems

Affiliations
Review

Attention Mechanisms and Their Applications to Complex Systems

Adrián Hernández et al. Entropy (Basel). .

Abstract

Deep learning models and graphics processing units have completely transformed the field of machine learning. Recurrent neural networks and long short-term memories have been successfully used to model and predict complex systems. However, these classic models do not perform sequential reasoning, a process that guides a task based on perception and memory. In recent years, attention mechanisms have emerged as a promising solution to these problems. In this review, we describe the key aspects of attention mechanisms and some relevant attention techniques and point out why they are a remarkable advance in machine learning. Then, we illustrate some important applications of these techniques in the modeling of complex systems.

Keywords: attention; complex and dynamical systems; deep learning; neural networks; self-attention; sequential reasoning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Multilayer neural network.
Figure 2
Figure 2
Temporal structure of a recurrent neural network.
Figure 3
Figure 3
Attention diagram. Attention as a sequential process of reasoning in which the task (query) is guided by a set of elements (values) of the source (or memory).
Figure 4
Figure 4
An encoder–decoder network.
Figure 5
Figure 5
An encoder–decoder network with attention.
Figure 6
Figure 6
A matrix of alignment scores. It represents how much of each input state should be considered when deciding the next state and generating the output.
Figure 7
Figure 7
Multi-headed attention. Self-attention process performed in parallel h times in different subspaces. The output values are concatenated and projected to a final value.
Figure 8
Figure 8
Diagram of the input features attention mechanism.
Figure 9
Figure 9
Diagram of the temporal attention mechanism.
Figure 10
Figure 10
Basic diagram of a memory network. For each input, the attention mechanism integrates a weighted sum over the memory vectors.
Figure 11
Figure 11
Self-attention graph. The self-attention component calculates how much each input vector contributes to form each output vector.

References

    1. Yadan O., Adams K., Taigman Y., Ranzato M. Multi-GPU Training of ConvNets. arXiv. 20131312.5853
    1. LeCun Y., Bengio Y., Hinton G. Deep Learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
    1. Sutskever I., Vinyals O., Le Q.V. Sequence to Sequence Learning with Neural Networks; Proceedings of the NIPS 2014; Montreal, QC, Canada. 8–13 December 2014.
    1. Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A., Hubert T., Baker L.R., Lai M., Bolton A., et al. Mastering the game of Go without human knowledge. Nature. 2017;550:354–359. doi: 10.1038/nature24270. - DOI - PubMed
    1. Goodfellow I., Bengio Y., Courville A. Deep Learning. MIT Press; Cambridge, MA, USA: 2016. [(accessed on 26 February 2021)]. Available online: http://www.deeplearningbook.org.