A Brief Review of Facial Emotion Recognition Based on Visual Information

Byoung Chul Ko¹

Affiliations

PMID: 29385749
PMCID: PMC5856145
DOI: 10.3390/s18020401

A Brief Review of Facial Emotion Recognition Based on Visual Information

Byoung Chul Ko. Sensors (Basel). 2018.

. 2018 Jan 30;18(2):401.

doi: 10.3390/s18020401.

Author

Byoung Chul Ko¹

Affiliation

¹ Department of Computer Engineering, Keimyung University, Daegu 42601, Korea. niceko@kmu.ac.kr.

PMID: 29385749
PMCID: PMC5856145
DOI: 10.3390/s18020401

Abstract

Facial emotion recognition (FER) is an important topic in the fields of computer vision and artificial intelligence owing to its significant academic and commercial potential. Although FER can be conducted using multiple sensors, this review focuses on studies that exclusively use facial images, because visual expressions are one of the main information channels in interpersonal communication. This paper provides a brief review of researches in the field of FER conducted over the past decades. First, conventional FER approaches are described along with a summary of the representative categories of FER systems and their main algorithms. Deep-learning-based FER approaches using deep networks enabling "end-to-end" learning are then presented. This review also focuses on an up-to-date hybrid deep-learning approach combining a convolutional neural network (CNN) for the spatial features of an individual frame and long short-term memory (LSTM) for temporal features of consecutive frames. In the later part of this paper, a brief review of publicly available evaluation metrics is given, and a comparison with benchmark results, which are a standard for a quantitative comparison of FER researches, is described. This review can serve as a brief guidebook to newcomers in the field of FER, providing basic knowledge and a general understanding of the latest state-of-the-art studies, as well as to experienced researchers looking for productive directions for future work.

Keywords: conventional FER; convolutional neural networks; deep learning-based FER; facial action coding system; facial action unit; facial emotion recognition; long short term memory.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

**Figure 1**
Procedure used in conventional FER approaches: From input images (a), face region and facial landmarks are detected (b), spatial and temporal features are extracted from the face components and landmarks (c), and the facial expression is determined based on one of facial categories using pre-trained pattern classifiers (face images are taken from CK+ dataset [10]) (d).

**Figure 2**
Procedure of CNN-based FER approaches: (a) The input images are convolved using filters in the convolution layers. (b) From the convolution results, feature maps are constructed and max-pooling (subsampling) layers lower the spatial resolution of the given feature maps. (c) CNNs apply fully connected neural-network layers behind the convolutional layers, and (d) a single face expression is recognized based on the output of softmax (face images are taken from CK+ dataset [10]).

**Figure 3**
Sample examples of various facial emotions and AUs: (a) basic emotions (sad, fearful, and angry), (face images are taken from CE dataset [17]) (b) compound emotions (happily surprised, happily disgusted, and sadly fearful) (face images are taken from CE dataset [17]), (c) spontaneous expressions, and (face images are taken from YouTube) (d) AUs (upper and lower face) (face images are taken from CK+ dataset [10]).

**Figure 4**
The basic structure of an LSTM, adapted from [50]. (a) One LSTM cell contains four interacting layers: the cell state, an input gate layer, a forget gate layer, and an output gate layer, (b) The repeating module of cells in an LSTM.

**Figure 5**
Overview of the general hybrid deep-learning framework for FER. The outputs of the CNNs and LSTMs are further aggregated into a fusion network to produce a per-frame prediction, adapted from [53].

**Figure 6**
Examples of nine representative databases related to FER. Databases (a) through (g) support 2D still images and 2D video sequences, and databases (h) through (i) support 3D video sequences.

See this image and copyright information in PMC

References

1. Mehrabian A. Communication without words. Psychol. Today. 1968;2:53–56.
1. Kaulard K., Cunningham D.W., Bülthoff H.H., Wallraven C. The MPI facial expression database—A validated database of emotional and conversational facial expressions. PLoS ONE. 2012;7:e32321. doi: 10.1371/journal.pone.0032321. - DOI - PMC - PubMed
1. Dornaika F., Raducanu B. Efficient facial expression recognition for human robot interaction; Proceedings of the 9th International Work-Conference on Artificial Neural Networks on Computational and Ambient Intelligence; San Sebastián, Spain. 20–22 June 2007; pp. 700–708.
1. Bartneck C., Lyons M.J. HCI and the face: Towards an art of the soluble; Proceedings of the International Conference on Human-Computer Interaction: Interaction Design and Usability; Beijing, China. 22–27 July 2007; pp. 20–29.
1. Hickson S., Dufour N., Sud A., Kwatra V., Essa I.A. Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. arXiv. 2017. 1707.07204v2 2017

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Brief Review of Facial Emotion Recognition Based on Visual Information

Affiliation

A Brief Review of Facial Emotion Recognition Based on Visual Information

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous