. 2022 Mar 29;22(7):2623.

doi: 10.3390/s22072623.

Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data

Sangyong Park¹, Jaeseon Kim¹, Yong Seok Heo^{1

2}

Affiliations

¹ Department of Electrical and Computer Engineering, Ajou University, Suwon 16449, Korea.
² Department of Artificial Intelligence, Ajou University, Suwon 16449, Korea.

PMID: 35408237
PMCID: PMC9003518
DOI: 10.3390/s22072623

Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data

Sangyong Park et al. Sensors (Basel). 2022.

. 2022 Mar 29;22(7):2623.

doi: 10.3390/s22072623.

Authors

Sangyong Park¹, Jaeseon Kim¹, Yong Seok Heo^{1

2}

Affiliations

¹ Department of Electrical and Computer Engineering, Ajou University, Suwon 16449, Korea.
² Department of Artificial Intelligence, Ajou University, Suwon 16449, Korea.

PMID: 35408237
PMCID: PMC9003518
DOI: 10.3390/s22072623

Abstract

To achieve high performance, most deep convolutional neural networks (DCNNs) require a significant amount of training data with ground truth labels. However, creating ground-truth labels for semantic segmentation requires more time, human effort, and cost compared with other tasks such as classification and object detection, because the ground-truth label of every pixel in an image is required. Hence, it is practically demanding to train DCNNs using a limited amount of training data for semantic segmentation. Generally, training DCNNs using a limited amount of data is problematic as it easily results in a decrease in the accuracy of the networks because of overfitting to the training data. Here, we propose a new regularization method called pixel-wise adaptive label smoothing (PALS) via self-knowledge distillation to stably train semantic segmentation networks in a practical situation, in which only a limited amount of training data is available. To mitigate the problem caused by limited training data, our method fully utilizes the internal statistics of pixels within an input image. Consequently, the proposed method generates a pixel-wise aggregated probability distribution using a similarity matrix that encodes the affinities between all pairs of pixels. To further increase the accuracy, we add one-hot encoded distributions with ground-truth labels to these aggregated distributions, and obtain our final soft labels. We demonstrate the effectiveness of our method for the Cityscapes dataset and the Pascal VOC2012 dataset using limited amounts of training data, such as 10%, 30%, 50%, and 100%. Based on various quantitative and qualitative comparisons, our method demonstrates more accurate results compared with previous methods. Specifically, for the Cityscapes test set, our method achieved mIoU improvements of 0.076%, 1.848%, 1.137%, and 1.063% for 10%, 30%, 50%, and 100% training data, respectively, compared with the method of the cross-entropy loss using one-hot encoding with ground truth labels.

Keywords: limited training data; regularization; self-knowledge distillation; semantic segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
A schematic flowchart of our method. Our method aggregates distributions based on pair-wise feature similarity and generates a pixel-wise soft label by weighted sum of a one-hot encoding with ground truth label and the aggregated distribution for each pixel according to training iteration.

**Figure 2**
Comparative results of methods trained using various ratios of limited training data. Results of various ratios of training data including 10%, 30%, 50%, and 100% are shown. Value below each result represents mIoU.

**Figure 3**
Overview of the proposed method, which is categorized into training and test paths. Blue and red arrows represent training and test paths, respectively.

**Figure 4**
Process of our PALS module.

**Figure 5**
Process of PA module, where $↓ (\cdot)$ denotes the downsampling operation.

**Figure 6**
Results of the comparison of various methods using limited training data for DeepLab-V3+ [10] with the Xception65 [76] network on the Cityscapes dataset. (a) Input image. (b) Ground-truth image. (c) CE [10] result. (d) CP [22] result. (e) LS [20] result. (f) Our result.

**Figure 7**
Results of the comparison of various methods using limited training data for DeepLab-V3+ [10] with the ResNet18 [77] network on the Cityscapes dataset. (a) Input image. (b) Ground-truth image. (c) CE [10] result. (d) CP [22] result. (e) LS [20] result. (f) Our result.

**Figure 8**
Results of the comparison of various methods using limited training data for DeepLab-V3+ [10] with the Xception65 [76] network on the Pascal VOC2012 dataset. (a) Input image. (b) Ground-truth image. (c) CE [10] result. (d) CP [22] result. (e) LS [20] result. (f) Our result.

See this image and copyright information in PMC

Cited by

FIAEPI-KD: A novel knowledge distillation approach for precise detection of missing insulators in transmission lines.
Cui H, Huang D, Feng W, Li Z, Ouyang Q, Zhong C. Cui H, et al. PLoS One. 2025 May 30;20(5):e0324524. doi: 10.1371/journal.pone.0324524. eCollection 2025. PLoS One. 2025. PMID: 40445919 Free PMC article.

References

1. Zeng W., Luo W., Suo S., Sadat A., Yang B., Casas S., Urtasun R. End-To-End Interpretable Neural Motion Planner; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Long Beach, CA, USA. 15–20 June 2019; pp. 8652–8661.
1. Philion J., Fidler S. Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D; Proceedings of the European Conference on Computer Vision (ECCV); Glasgow, UK. 23–28 August 2020.
1. Cherabier I.F., Schönberger J.L., Oswald M.R., Pollefeys M., Geiger A. Learning Priors for Semantic 3D Reconstruction; Proceedings of the European Conference on Computer Vision (ECCV); Munich, Germany. 8–14 September 2018; pp. 314–330.
1. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI); Munich, Germany. 5–9 October 2015; pp. 234–241.
1. Srivastava A., Jha D., Chanda S., Pal U., Johansen H.D., Johansen D., Riegler M.A., Ali S., Halvorsen P. MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation. arXiv. 2021 doi: 10.1109/JBHI.2021.3138024.2105.07451 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

NRF5199991014091/BK21 FOUR program of the National Research Foundation of Korea

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data

Affiliations

Semantic Segmentation Using Pixel-Wise Adaptive Label Smoothing via Self-Knowledge Distillation for Limited Labeling Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources