Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;25(5):1574.
doi: 10.3390/s25051574.

AD-VAE: Adversarial Disentangling Variational Autoencoder

Affiliations

AD-VAE: Adversarial Disentangling Variational Autoencoder

Adson Silva et al. Sensors (Basel). .

Abstract

Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like pose, illumination, and occlusion. Deep learning techniques have shown promising results in recent years using VAE and GAN, with approaches such as patch-VAE, VAE-GAN for 3D Indoor Scene Synthesis, and hybrid VAE-GAN models. However, in Single Sample Per Person Face Recognition (SSPP FR), the challenge of learning robust and discriminative features that preserve the subject's identity persists. To address these issues, we propose a novel framework called AD-VAE, specifically for SSPP FR, using a combination of variational autoencoder (VAE) and Generative Adversarial Network (GAN) techniques. The proposed AD-VAE framework is designed to learn how to build representative identity-preserving prototypes from both controlled and wild datasets, effectively handling variations like pose, illumination, and occlusion. The method uses four networks: an encoder and decoder similar to VAE, a generator that receives the encoder output plus noise to generate an identity-preserving prototype, and a discriminator that operates as a multi-task network. AD-VAE outperforms all tested state-of-the-art face recognition techniques, demonstrating its robustness. The proposed framework achieves superior results on four controlled benchmark datasets-AR, E-YaleB, CAS-PEAL, and FERET-with recognition rates of 84.9%, 94.6%, 94.5%, and 96.0%, respectively, and achieves remarkable performance on the uncontrolled LFW dataset, with a recognition rate of 99.6%. The AD-VAE framework shows promising potential for future research and real-world applications.

Keywords: GAN; face recognition; single sample.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
The first part of the proposed AD-VAE, which works as a variational adversarial autoencoder. The x denotes the image data from X, and xdec denotes the decoder reconstruction from x. The encoder Enc has as input image x and produces two outputs, the mean (μ) and the log-variance (σ), which define the parameters of a normal distribution N(μ,σ). From distribution N(μ,σ), we extract a latent vector cN(μ,σ) that serves as input to decoder Dec which outputs the reconstruction xdec.
Figure 2
Figure 2
The second part of the proposed architecture of AD-VAE, where x denotes the image from SSPP data X, xrp denotes the image real prototype x, and x^ is the generated prototype from image x. The pre-trained (first part) encoder Enc generates the mean μ and variation σ of x. From distribution N(μ,σ), we extract a latent vector cN(μ,σ) that concatenates with noise vector zN(0,1) to serve as the input to generator Gen which outputs the prototype x^ of x. The discriminator D (1) determines the id and variation of x; (2) determines the id, variation, and whether x^ is real or fake; and (3) determines whether xrp is real or fake.
Figure 3
Figure 3
The prototypes generated by AD-VAE are presented as follows: (a) the sample image with variations, (b) the generated prototype of image (a), and (c) the real prototype of image (a). On the right side, the name of the dataset and the variation are indicated.

References

    1. Lahasan B., Lutfi S.L., San-Segundo R. A survey on techniques to handle face recognition challenges: Occlusion, single sample per subject and expression. Artif. Intell. Rev. 2017;52:949–979. doi: 10.1007/s10462-017-9578-y. - DOI
    1. Liu F., Chen D., Wang F., Li Z., Xu F. Deep learning based single sample face recognition: A survey. Artif. Intell. Rev. 2023;56:2723–2748. doi: 10.1007/s10462-022-10240-2. - DOI
    1. Minaee S., Abdolrashidi A., Su H., Bennamoun M., Zhang D. Biometrics recognition using deep learning: A survey. Artif. Intell. Rev. 2023;56:8647–8695. doi: 10.1007/s10462-022-10237-x. - DOI
    1. Zhao W., Chellappa R., Phillips P.J., Rosenfeld A. Face recognition: A literature survey. ACM Comput. Surv. 2003;35:399–458. doi: 10.1145/954339.954342. - DOI
    1. Deng W., Hu J., Guo J. In Defense of Sparsity Based Face Recognition; Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; Portland, OR, USA. 23–28 June 2013; pp. 399–406.

LinkOut - more resources