Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 18;9(9):191.
doi: 10.3390/jimaging9090191.

Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation

Affiliations

Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation

Anton Vasiliuk et al. J Imaging. .

Abstract

Deep learning models perform unreliably when the data come from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection methods help to identify such data samples, preventing erroneous predictions. In this paper, we further investigate OOD detection effectiveness when applied to 3D medical image segmentation. We designed several OOD challenges representing clinically occurring cases and found that none of the methods achieved acceptable performance. Methods not dedicated to segmentation severely failed to perform in the designed setups; the best mean false-positive rate at a 95% true-positive rate (FPR) was 0.59. Segmentation-dedicated methods still achieved suboptimal performance, with the best mean FPR being 0.31 (lower is better). To indicate this suboptimality, we developed a simple method called Intensity Histogram Features (IHF), which performed comparably or better in the same challenges, with a mean FPR of 0.25. Our findings highlight the limitations of the existing OOD detection methods with 3D medical images and present a promising avenue for improving them. To facilitate research in this area, we release the designed challenges as a publicly available benchmark and formulate practical criteria to test the generalization of OOD detection beyond the suggested benchmark. We also propose IHF as a solid baseline to contest emerging methods.

Keywords: anomaly detection; computed tomography; magnetic resonance imaging; out-of-distribution detection; segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Examples of CT images (representative axial slices) from different simulated OOD sources in our benchmark.
Figure 2
Figure 2
Examples of MRI images (representative axial slices) from different simulated OOD sources in our benchmark.
Figure 3
Figure 3
The proposed OOD detection method, called Intensity Histogram Features (IHF). It consists of three steps: calculating a m-dimensional vector as histogram bin values from the preprocessed image (Step 1), fitting and applying PCA to the occuring data, and calculating the Mahalanobis distance between a test vector and ID sample distribution (Step 3). We apply IHF to the 3D images and illustrate the process using 2D axial slices for simplicity. (* PCA is fitted once on all training data.)
Figure 4
Figure 4
Dependence of IHF on its two hyperparameters: the number of histogram bins (m) and explained variance in PCA (v). We give the results for both IHF variants and CT and MRI setups.
Figure 5
Figure 5
FPR under synthetically distorted data for every distortion severity level. Blue line indicates method’s average trend across presented challenges with 95% confidence interval. The other UE methods (MCD, Ensemble and G-ODIN) are excluded since their average trend is similar to Entropy.

References

    1. Wang M., Deng W. Deep Visual Domain Adaptation: A Survey. Neurocomputing. 2018;312:135–153. doi: 10.1016/j.neucom.2018.05.083. - DOI
    1. Kompa B., Snoek J., Beam A.L. Second opinion needed: Communicating uncertainty in medical machine learning. NPJ Digit. Med. 2021;4:4. doi: 10.1038/s41746-020-00367-3. - DOI - PMC - PubMed
    1. Yang J., Zhou K., Li Y., Liu Z. Generalized out-of-distribution detection: A survey. arXiv. 20212110.11334
    1. Hendrycks D., Gimpel K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv. 20161610.02136
    1. Hendrycks D., Basart S., Mazeika M., Mostajabi M., Steinhardt J., Song D. Scaling out-of-distribution detection for real-world settings. arXiv. 20191911.11132

LinkOut - more resources