Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 16;25(12):3762.
doi: 10.3390/s25123762.

Facial Landmark-Driven Keypoint Feature Extraction for Robust Facial Expression Recognition

Affiliations

Facial Landmark-Driven Keypoint Feature Extraction for Robust Facial Expression Recognition

Jaehyun So et al. Sensors (Basel). .

Abstract

Facial expression recognition (FER) is a core technology that enables computers to understand and react to human emotions. In particular, the use of face alignment algorithms as a preprocessing step in image-based FER is important for accurately normalizing face images in terms of scale, rotation, and translation to improve FER accuracy. Recently, FER studies have been actively leveraging feature maps computed by face alignment networks to enhance FER performance. However, previous studies were limited in their ability to effectively apply information from specific facial regions that are important for FER, as they either only used facial landmarks during the preprocessing step or relied solely on the feature maps from the face alignment networks. In this paper, we propose the use of Keypoint Features extracted from feature maps at the coordinates of facial landmarks. To effectively utilize Keypoint Features, we further propose a Keypoint Feature regularization method using landmark perturbation for robustness, and an attention mechanism that emphasizes all Keypoint Features using representative Keypoint Features derived from a nasal base landmark, which carries information for the whole face, to improve performance. We performed experiments on the AffectNet, RAF-DB, and FERPlus datasets using a simply designed network to validate the effectiveness of the proposed method. As a result, the proposed method achieved a performance of 68.17% on AffectNet-7, 64.87% on AffectNet-8, 93.16% on RAF-DB, and 91.44% on FERPlus. Furthermore, the network pretrained on AffectNet-8 had improved performances of 94.04% on RAF-DB and 91.66% on FERPlus. These results demonstrate that the proposed Keypoint Features can achieve comparable results to those of the existing methods, highlighting their potential for enhancing FER performance through the effective utilization of key facial region features.

Keywords: deep neural network; face alignment; facial expression recognition; feature attention.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Flow of the proposed FER method. Global features are computed from the backbone and face alignment networks. Keypoint features are then extracted from these global features based on facial landmark coordinates from the face alignment network, followed by refinement using Representative Keypoint Feature Attention, and finally, classification.
Figure 2
Figure 2
Overview of NKF structure for FER. The Global Feature Network detects facial landmarks and computes feature maps. The Keypoint Feature Network extracts the Keypoint Features from the Global Features at the facial landmark coordinates. Subsequently, they are improved by a Representative Keypoint Feature Attention module and finally classified for FER.
Figure 3
Figure 3
t-SNE-based [43] 2D projection of Keypoint Features extracted from test dataset samples after training the proposed network on the RAF-DB training set. The points at the top left represent the locations of the 68 mean landmarks of the 3D Morphable Model [36]. The projection results at the top right display only red dots, excluding blue dots for ease of visual analysis, and are placed at the corresponding locations. The blue box at the bottom shows an enlarged view of the distribution of the Keypoint Features corresponding to the outer face contour, and the red box shows an enlarged view of the distribution of the Keypoint Features corresponding to the upper lip.
Figure 4
Figure 4
Generating inaccurately perturbed landmarks by adding random offsets during the training phase. The intentional incorporation of these inaccuracies in the landmark coordinates during training serves to extract Keypoint Features and enhance the network’s robustness and generalization performance. This process effectively makes the network less sensitive to potential errors in landmark detection.
Figure 5
Figure 5
t-SNE [40] 2D projection illustrating Keypoint Feature distributions for samples from the RAF-DB test dataset, pre- and post-RKFA. The upper distribution shows the state before RKFA, and the lower distribution shows the state after RFKA is applied, all of which are depicted on the left side. The red, blue, and green boxes show the enlarged distribution of nasal base Keypoint Features, face outline Keypoint Features, and refined Keypoint Features after applying RKFA, respectively.
Figure 6
Figure 6
Confusion matrices of NKF evaluated on the FER datasets. (a) AffectNet-7 showing 68.17% accuracy of the NKF model, (b) AffectNet-8 showing 64.87% accuracy, (c) RAF-DB showing 93.12% accuracy, (d) FERPlus dataset showing 91.44% accuracy, (e) RAF-DB showing 94.04% accuracy pretrained on AffectNet-8, and (f) FERPlus showing 91.66% accuracy pretrained on AffectNet-8.
Figure 7
Figure 7
FLOPs versus accuracy: (a) depicts the accuracy on RAF-DB, and (b) depicts the accuracy on AffectNet-7. The red, pink, blue, yellow, black, and green symbols denote NKF, POSTER++ [2], DDAMFN [30], LFNSB [20], TransFER [56], and DAN [54], respectively.
Figure 8
Figure 8
Three types of facial landmarks to extract Keypoint Features. Blue dots indicate unused landmarks; red dots indicate used landmarks. ‘Half’ denotes the selection of landmarks at odd-numbered indices. ‘Inner’ refers to the selection of landmarks within the face. ‘Full’ denotes the selection of all landmarks.
Figure 9
Figure 9
Examples of Keypoint Features. A green dot indicates a facial landmark, a red dot indicates a nasal base landmark, and a blue dot indicates the center location of a feature map.
Figure 10
Figure 10
Examples of failed cases. The top row displays the input image with the overlaid predicted facial landmarks. The bottom row presents the same input image along with the predicted facial expression. The green dots denote the predictions of the face alignment network.

Similar articles

References

    1. Zheng C., Mendieta M., Chen C. POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops; Paris, France. 2–6 October 2023; pp. 3146–3155.
    1. Mao J., Xu R., Yin X., Chang Y., Nie B., Huang A., Wang Y. Poster++: A simpler and stronger Facial Expression Recognition network. Pattern Recognit. 2024;148:110951. doi: 10.1016/j.patcog.2024.110951. - DOI
    1. Zhang Y., Wang C., Ling X., Deng W. Learn from all: Erasing attention consistency for noisy label facial expression recognition; Proceedings of the European Conference on Computer Vision (ECCV); Tel Aviv, Israel. 23–27 October 2022; pp. 418–434.
    1. Zhao Z., Liu Q., Zhou F. Robust lightweight Facial Expression Recognition network with label distribution training; Proceedings of the AAAI Conference on Artificial Intelligence (AAAI); Online. 2–9 February 2021; pp. 3510–3519.
    1. Mollahosseini A., Hasani B., Mahoor M.H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 2017;10:18–31. doi: 10.1109/TAFFC.2017.2740923. - DOI

LinkOut - more resources