Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 25:2023:2394-2402.
doi: 10.1109/ICCVW60793.2023.00254.

Self-Supervised Anomaly Detection from Anomalous Training Data via Iterative Latent Token Masking

Affiliations

Self-Supervised Anomaly Detection from Anomalous Training Data via Iterative Latent Token Masking

Ashay Patel et al. IEEE Int Conf Comput Vis Workshops. .

Abstract

Anomaly detection and segmentation pose an important task across sectors ranging from medical imaging analysis to industry quality control. However, current unsupervised approaches require training data to not contain any anomalies, a requirement that can be especially challenging in many medical imaging scenarios. In this paper, we propose Iterative Latent Token Masking, a self-supervised framework derived from a robust statistics point of view, translating an iterative model fitting with M-estimators to the task of anomaly detection. In doing so, this allows the training of unsupervised methods on datasets heavily contaminated with anomalous images. Our method stems from prior work on using Transformers, combined with a Vector Quantized-Variational Autoencoder, for anomaly detection, a method with state-of-the-art performance when trained on normal (non-anomalous) data. More importantly, we utilise the token masking capabilities of Transformers to filter out suspected anomalous tokens from each sample's sequence in the training set in an iterative self-supervised process, thus overcoming the difficulties of highly anomalous training data. Our work also highlights shortfalls in current state-of-the-art self-supervised, self-trained and unsupervised models when faced with small proportions of anomalous training data. We evaluate our method on whole-body PET data in addition to showing its wider application in more common computer vision tasks such as the industrial MVTec Dataset. Using varying levels of anomalous training data, our method showcases a superior performance over several state-of-the-art models, drawing attention to the potential of this approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Our approach can take anomalous training data without prior segmentation labels and train a Transformer to progressively remove the influence of anomalies from training such that, during inference, the model can detect new anomalies.
Figure 2
Figure 2. Proposed Iterative Latent Token Masking training pipeline.
Training samples are quantized using the trained VQ-VAE’s encoder, where a trained Transformer replaces anomalous tokens with multiple samplings. After decoding multiple latent codes with dropout, a KDE anomaly map is generated. This map is then thresholded and downsampled to give a binary mask of the same dimension as the latent space. These binary masks are then rasterized and used to mask tokens in the next Transformer training (as shown in the green box).
Figure 3
Figure 3. Anomaly Detection Performance of our proposed method over training cycles for the MVTec data (left) and PET data (right).
Results showcase performance using a constant Transformer latent token resampling threshold (red) and a decreasing threshold over iterations (blue).
Figure 4
Figure 4. Anomaly Detection Performance of our proposed methods along with state-of-the-art models for comparison showing change in AUROC and AUPRO with varying levels of anomalous training data.
Figure 5
Figure 5
Rows from top to bottom display (1st) input image; (2nd) ground truth segmentation; (3rd) anomaly map as the residual for the AE, (4th) VAE; (5th) abnormality map output for NSA, (6th) STKD; (7th) abnormality map as a KDE using our self-supervised training approach. Results are provided for 5 random samples from the PET dataset

Similar articles

References

    1. Baur Christoph, Denner Stefan, Wiestler Benedikt, Albarqouni Shadi, Navab Nassir. Autoencoders for unsupervised anomaly segmentation in brain mr images: A comparative study. 2020 Apr; 1, 2, 7, 8. - PubMed
    1. Baur Christoph, Wiestler Benedikt, Albarqouni Shadi, Navab Nassir. Deep autoencoding models for unsupervised anomaly segmentation in brain mr images. 2019. 7. - PubMed
    1. Bergmann Paul, Fauser Michael, Sattlegger David, Steger Carsten. Mvtec ad — a comprehensive real-world dataset for unsupervised anomaly detection. IEEE. 2019 Jun;:9584–9592. 1, 6.
    1. Chang Huiwen, Zhang Han, Barber Jarred, Maschinot AJ, Lezama Jose, Jiang Lu, Yang Ming-Hsuan, Murphy Kevin, Freeman William T, Rubinstein Michael, Li Yuanzhen, et al. Muse: Text-to-image generation via masked generative transformers. 2023. Jan, 5.
    1. Chen Mark, Radford Alec, Wu Jeff, Heewoo Jun, Dhariwal Prafulla. Generative pretraining from pixels. 2020 2.

LinkOut - more resources