Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug;32(3):346-360.
doi: 10.1016/j.zemedi.2021.11.004. Epub 2022 Jan 10.

Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis

Affiliations

Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis

Daniel Güllmar et al. Z Med Phys. 2022 Aug.

Erratum in

Abstract

The application of deep neural networks for segmentation in medical imaging has gained substantial interest in recent years. In many cases, this variant of machine learning has been shown to outperform other conventional segmentation approaches. However, little is known about its general applicability. Especially the robustness against image modifications (e.g., intensity variations, contrast variations, spatial alignment) has hardly been investigated. Data augmentation is often used to compensate for sensitivity to such changes, although its effectiveness has not yet been studied. Therefore, the goal of this study was to systematically investigate the sensitivity to variations in input data with respect to segmentation of medical images using deep learning. This approach was tested with two publicly available segmentation frameworks (DeepMedic and TractSeg). In the case of DeepMedic, the performance was tested using ground truth data, while in the case of TractSeg, the STAPLE technique was employed. In both cases, sensitivity analysis revealed significant dependence of the segmentation performance on input variations. The effects of different data augmentation strategies were also shown, making this type of analysis a useful tool for selecting the right parameters for augmentation. The proposed analysis should be applied to any deep learning image segmentation approach, unless the assessment of sensitivity to input variations can be directly derived from the network.

Keywords: Convolutional neural network; Data augmentation; Semantic image segmentation; Sensitivity analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagram of the workflow of the network performance sensitivity analysis, illustrated for a rotation with two degrees of freedom (rotation around the x- and y-axis, respectively). The test data set, including an MR volume and its corresponding ground-truth label, is first modified within a chosen parameter range using a specified transfer function. The modified test data sets are subsequently fed into a trained neural network, which performs the segmentation. In the final step, the resulting CNN segmentations are compared to the corresponding ground truth label volumes using the performance metric and the values of the latter are displayed as a function of the two parameters as a heat map.
Figure 2
Figure 2
Horizontal box plots of DSC scores for CNN segmentation using the five trained networks without data augmentation (CNNnoaug) and the applied four data augmentation methods (CNNrotaug;CNNrotaug2;CNNintaug;CNNrotaug+intaug), presented separately for all three studied test samples (EUH, EMCC, IBSR). For these plots, only the DSC values derived from test data without additional data manipulation (rotation or intensity manipulation) were used.
Figure 3
Figure 3
Sensitivity maps illustrating the CNN segmentation performance averaged over 11 test data sets from each test sample (rows: EUH, EMCC, IBSR) as a 2D function of x and y image rotations. The columns represent the four differently trained networks (columns: CNNnoaug, CNNrotaug, CNNrotaug2, CNNrotaug+intaug).
Figure 4
Figure 4
Sensitivity maps of CNN segmentation performance averaged over 14 test data sets from each test sample (rows: EUH, EMCC, IBSR) as a 2D function of image intensity offset and scale factor. The columns represent the three differently trained networks (columns: CNNnoaug, CNNrotaug, CNNrotaug+intaug).
Figure 5
Figure 5
Sensitivity map CNNrotaug(EMCC) (taken from Fig. 3) and exemplary coronal slices of subject #1 from the EMCC test sample, with the ground truth label (blue) and a good (DSC = 0.94, (B)), bad (DSC = 0.42, (A)) and an intermediate DSC = 0.74, (C)) performing CNNrotaug segmentation (yellow) superimposed on the image. Mean DSC for the sensitivity map was 0.80 ± 0.18.
Figure 6
Figure 6
Heat maps of the DSC distribution with contour plots, resulting from the white matter tract segmentation as a function of input data rotations around the x- and y-axis (representing nodding and head wiggling), averaged across all white matter segments (n = 72) for 6 of the 7 subjects studied. The scaling of the color bar legend refers to the heat maps.
Figure 7
Figure 7
Heat maps of the DSC distribution with contour plots for several pairs of selected left- and right-sided white matter tracts (AF = arcuate fascicle, ATR = anterior thalamic radiation, CG = cingulum, CST = corticospinal tract, FPT = fronto-pontine tract, FX = fornix, SLF3 = superior longitudinal fascicle III, UF = uncinate fascicle). The DSC distribution is displayed as a function of input data rotation around the x- and y-axis (head nodding and wobbling), averaged across 7 subjects. The scaling of the color bar legend refers to the heat maps.
Figure 8
Figure 8
Representation of the segmentation performance of the corticospinal tract (CST), measured by DSC, as a function of the rotation of the input data around the x- and y-axis (head nodding and wobbling) for 6 of the 7 investigated subjects. The scaling of the color bar legend refers to the heat maps.
Figure 9
Figure 9
Boxplots of DSC values for a selected number of pairs of white matter tracts (same as in Figure 7), including AF, ATR, CG, CST, FPT, FX, SLF3, and UF. Each subplot shows the results for network weights of version V1.0.0 and V1.1.0 for the basic parameter configuration without artificial rotation as well as for the individual maxima in the investigated parameter space (x- and y-rotation).

Similar articles

Cited by

References

    1. Anwar S.M., Majid M., Qayyum A., Awais M., Alnowami M., Khan M.K. Medical image analysis using convolutional neural networks: a review. J Med Syst. 2018;42(11):226. - PubMed
    1. Bernal J., Kushibar K., Asfaw D.S., Valverde S., Oliver A., Marti R., et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review. Artif Intell Med. 2019;95:64–81. - PubMed
    1. Litjens G., Kooi T., Bejnordi B.E., Setio A.A.A., Ciompi F., Ghafoorian M., et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. - PubMed
    1. Lundervold A.S., Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Z Med Phys. 2019;29(2):102–127. - PubMed
    1. Maier A., Syben C., Lasser T., Riess C. A gentle introduction to deep learning in medical image processing. Z Med Phys. 2019;29(2):86–101. - PubMed

LinkOut - more resources