. 2022 Aug;32(3):346-360.

doi: 10.1016/j.zemedi.2021.11.004. Epub 2022 Jan 10.

Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis

Daniel Güllmar¹, Nina Jacobsen², Andreas Deistung³, Dagmar Timmann⁴, Stefan Ropele⁵, Jürgen R Reichenbach⁶

Affiliations

¹ Medical Physics Group, Institute of Diagnostic and Interventional Radiology, Jena University Hospital - Friedrich Schiller University Jena, Germany. Electronic address: daniel.guellmar@med.uni-jena.de.
² Medical Physics Group, Institute of Diagnostic and Interventional Radiology, Jena University Hospital - Friedrich Schiller University Jena, Germany.
³ University Clinic and Outpatient Clinic for Radiology, Department for Radiation Medicine, University Hospital Halle (Saale), Germany.
⁴ Department of Neurology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.
⁵ Department of Neurology, Karl-Franzens University of Graz, Austria.
⁶ Medical Physics Group, Institute of Diagnostic and Interventional Radiology, Jena University Hospital - Friedrich Schiller University Jena, Germany; Michael Stifel Center Jena for Data-Driven and Simulation Science, Friedrich-Schiller-University Jena, Jena, Germany.

PMID: 35016819
PMCID: PMC9948839
DOI: 10.1016/j.zemedi.2021.11.004

Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis

Daniel Güllmar et al. Z Med Phys. 2022 Aug.

. 2022 Aug;32(3):346-360.

doi: 10.1016/j.zemedi.2021.11.004. Epub 2022 Jan 10.

Authors

Daniel Güllmar¹, Nina Jacobsen², Andreas Deistung³, Dagmar Timmann⁴, Stefan Ropele⁵, Jürgen R Reichenbach⁶

Affiliations

¹ Medical Physics Group, Institute of Diagnostic and Interventional Radiology, Jena University Hospital - Friedrich Schiller University Jena, Germany. Electronic address: daniel.guellmar@med.uni-jena.de.
² Medical Physics Group, Institute of Diagnostic and Interventional Radiology, Jena University Hospital - Friedrich Schiller University Jena, Germany.
³ University Clinic and Outpatient Clinic for Radiology, Department for Radiation Medicine, University Hospital Halle (Saale), Germany.
⁴ Department of Neurology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.
⁵ Department of Neurology, Karl-Franzens University of Graz, Austria.
⁶ Medical Physics Group, Institute of Diagnostic and Interventional Radiology, Jena University Hospital - Friedrich Schiller University Jena, Germany; Michael Stifel Center Jena for Data-Driven and Simulation Science, Friedrich-Schiller-University Jena, Jena, Germany.

PMID: 35016819
PMCID: PMC9948839
DOI: 10.1016/j.zemedi.2021.11.004

Erratum in

Erratum to "Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis" [Z Med Phys 32 (2022) 346-360].
Güllmar D, Jacobsen N, Deistung A, Timmann D, Ropele S, Reichenbach JR. Güllmar D, et al. Z Med Phys. 2025 Feb;35(1):114. doi: 10.1016/j.zemedi.2024.07.007. Epub 2024 Sep 2. Z Med Phys. 2025. PMID: 39227222 Free PMC article. No abstract available.

Abstract

The application of deep neural networks for segmentation in medical imaging has gained substantial interest in recent years. In many cases, this variant of machine learning has been shown to outperform other conventional segmentation approaches. However, little is known about its general applicability. Especially the robustness against image modifications (e.g., intensity variations, contrast variations, spatial alignment) has hardly been investigated. Data augmentation is often used to compensate for sensitivity to such changes, although its effectiveness has not yet been studied. Therefore, the goal of this study was to systematically investigate the sensitivity to variations in input data with respect to segmentation of medical images using deep learning. This approach was tested with two publicly available segmentation frameworks (DeepMedic and TractSeg). In the case of DeepMedic, the performance was tested using ground truth data, while in the case of TractSeg, the STAPLE technique was employed. In both cases, sensitivity analysis revealed significant dependence of the segmentation performance on input variations. The effects of different data augmentation strategies were also shown, making this type of analysis a useful tool for selecting the right parameters for augmentation. The proposed analysis should be applied to any deep learning image segmentation approach, unless the assessment of sensitivity to input variations can be directly derived from the network.

Keywords: Convolutional neural network; Data augmentation; Semantic image segmentation; Sensitivity analysis.

PubMed Disclaimer

Figures

**Figure 1**
Diagram of the workflow of the network performance sensitivity analysis, illustrated for a rotation with two degrees of freedom (rotation around the x- and y-axis, respectively). The test data set, including an MR volume and its corresponding ground-truth label, is first modified within a chosen parameter range using a specified transfer function. The modified test data sets are subsequently fed into a trained neural network, which performs the segmentation. In the final step, the resulting CNN segmentations are compared to the corresponding ground truth label volumes using the performance metric and the values of the latter are displayed as a function of the two parameters as a heat map.

**Figure 2**
Horizontal box plots of DSC scores for CNN segmentation using the five trained networks without data augmentation (CNN_noaug) and the applied four data augmentation methods (*CNN_rotaug*;*CNN_rotaug2*;*CNN_intaug*;*CNN_{rotaug+intaug}*), presented separately for all three studied test samples (*EUH*, *EMCC*, *IBSR*). For these plots, only the DSC values derived from test data without additional data manipulation (rotation or intensity manipulation) were used.

**Figure 3**
Sensitivity maps illustrating the CNN segmentation performance averaged over 11 test data sets from each test sample (rows: *EUH*, *EMCC*, *IBSR*) as a 2D function of x and y image rotations. The columns represent the four differently trained networks (columns: CNN_noaug, CNN_rotaug, CNN_rotaug2, CNN_{rotaug+intaug}).

**Figure 4**
Sensitivity maps of CNN segmentation performance averaged over 14 test data sets from each test sample (rows: EUH, EMCC, IBSR) as a 2D function of image intensity offset and scale factor. The columns represent the three differently trained networks (columns: *CNN_noaug*, *CNN_rotaug*, *CNN_{rotaug+intaug}*).

**Figure 5**
Sensitivity map *CNN_rotaug*(*EMCC*) (taken from Fig. 3) and exemplary coronal slices of subject #1 from the EMCC test sample, with the ground truth label (blue) and a good (DSC = 0.94, (B)), bad (DSC = 0.42, (A)) and an intermediate DSC = 0.74, (C)) performing *CNN_rotaug* segmentation (yellow) superimposed on the image. Mean DSC for the sensitivity map was 0.80 ± 0.18.

**Figure 6**
Heat maps of the DSC distribution with contour plots, resulting from the white matter tract segmentation as a function of input data rotations around the x- and y-axis (representing nodding and head wiggling), averaged across all white matter segments (n = 72) for 6 of the 7 subjects studied. The scaling of the color bar legend refers to the heat maps.

**Figure 7**
Heat maps of the DSC distribution with contour plots for several pairs of selected left- and right-sided white matter tracts (AF = arcuate fascicle, ATR = anterior thalamic radiation, CG = cingulum, CST = corticospinal tract, FPT = fronto-pontine tract, FX = fornix, SLF3 = superior longitudinal fascicle III, UF = uncinate fascicle). The DSC distribution is displayed as a function of input data rotation around the x- and y-axis (head nodding and wobbling), averaged across 7 subjects. The scaling of the color bar legend refers to the heat maps.

**Figure 8**
Representation of the segmentation performance of the corticospinal tract (CST), measured by DSC, as a function of the rotation of the input data around the x- and y-axis (head nodding and wobbling) for 6 of the 7 investigated subjects. The scaling of the color bar legend refers to the heat maps.

**Figure 9**
Boxplots of DSC values for a selected number of pairs of white matter tracts (same as in Figure 7), including AF, ATR, CG, CST, FPT, FX, SLF3, and UF. Each subplot shows the results for network weights of version V1.0.0 and V1.1.0 for the basic parameter configuration without artificial rotation as well as for the individual maxima in the investigated parameter space (x- and y-rotation).

See this image and copyright information in PMC

Cited by

Sensitivity of Diffusion MRI to White Matter Pathology: Influence of Diffusion Protocol, Magnetic Field Strength, and Processing Pipeline in Systemic Lupus Erythematosus.
Kornaropoulos EN, Winzeck S, Rumetshofer T, Wikstrom A, Knutsson L, Correia MM, Sundgren PC, Nilsson M. Kornaropoulos EN, et al. Front Neurol. 2022 Apr 26;13:837385. doi: 10.3389/fneur.2022.837385. eCollection 2022. Front Neurol. 2022. PMID: 35557624 Free PMC article.

References

1. Anwar S.M., Majid M., Qayyum A., Awais M., Alnowami M., Khan M.K. Medical image analysis using convolutional neural networks: a review. J Med Syst. 2018;42(11):226. - PubMed
1. Bernal J., Kushibar K., Asfaw D.S., Valverde S., Oliver A., Marti R., et al. Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review. Artif Intell Med. 2019;95:64–81. - PubMed
1. Litjens G., Kooi T., Bejnordi B.E., Setio A.A.A., Ciompi F., Ghafoorian M., et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. - PubMed
1. Lundervold A.S., Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Z Med Phys. 2019;29(2):102–127. - PubMed
1. Maier A., Syben C., Lasser T., Riess C. A gentle introduction to deep learning in medical image processing. Z Med Phys. 2019;29(2):86–101. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis

Affiliations

Investigation of biases in convolutional neural networks for semantic segmentation using performance sensitivity analysis

Authors

Affiliations

Erratum in

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources