. 2024 Sep 25:10:e2250.

doi: 10.7717/peerj-cs.2250. eCollection 2024.

Road surface semantic segmentation for autonomous driving

Huaqi Zhao¹, Su Wang¹, Xiang Peng¹, Jeng-Shyang Pan², Rui Wang³, Xiaomin Liu¹

Affiliations

¹ The Heilongjiang Provincial Key Laboratory of Autonomous Intelligence and Information Processing, School of Information and Electronic Technology, Jiamusi University, Jiamusi, Heilongjiang, China.
² College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong, China.
³ Dongfeng District People's Court, Jiamusi, Heilongjiang, China.

PMID: 39650429
PMCID: PMC11623202
DOI: 10.7717/peerj-cs.2250

Road surface semantic segmentation for autonomous driving

Huaqi Zhao et al. PeerJ Comput Sci. 2024.

. 2024 Sep 25:10:e2250.

doi: 10.7717/peerj-cs.2250. eCollection 2024.

Authors

Huaqi Zhao¹, Su Wang¹, Xiang Peng¹, Jeng-Shyang Pan², Rui Wang³, Xiaomin Liu¹

Affiliations

¹ The Heilongjiang Provincial Key Laboratory of Autonomous Intelligence and Information Processing, School of Information and Electronic Technology, Jiamusi University, Jiamusi, Heilongjiang, China.
² College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong, China.
³ Dongfeng District People's Court, Jiamusi, Heilongjiang, China.

PMID: 39650429
PMCID: PMC11623202
DOI: 10.7717/peerj-cs.2250

Abstract

Although semantic segmentation is widely employed in autonomous driving, its performance in segmenting road surfaces falls short in complex traffic environments. This study proposes a frequency-based semantic segmentation with a transformer (FSSFormer) based on the sensitivity of semantic segmentation to frequency information. Specifically, we propose a weight-sharing factorized attention to select important frequency features that can improve the segmentation performance of overlapping targets. Moreover, to address boundary information loss, we used a cross-attention method combining spatial and frequency features to obtain further detailed pixel information. To improve the segmentation accuracy in complex road scenarios, we adopted a parallel-gated feedforward network segmentation method to encode the position information. Extensive experiments demonstrate that the mIoU of FSSFormer increased by 2% compared with existing segmentation methods on the Cityscapes dataset.

Keywords: Cross-attention combining spatial and frequency features; Parallel-gated feedforward network; Semantic segmentation; Transformer; Weight-sharing factorized attention.

PubMed Disclaimer

Conflict of interest statement

Rui Wang is employed by Dongfeng District People’s Court.

Figures

**Figure 1. An example of a complex urban scene.**
This image is taken by our team.

**Figure 2. The architecture of proposed segmentation method.**

**Figure 3. The structure of the dynamic frequency capture module.**

**Figure 4. The structure of the cross-attention combining spatial and frequency features.**

**Figure 5. The structure of the linear attention operator.**

**Figure 6. The structure of the parallel-gated feedforward network module.**

**Figure 7. Parameter analysis of the group number M of important frequency feature extraction method.**
The red-dot data point shows that when the number of groups of low-frequency capture kernels was four, the mIoU was the highest, reaching 73.38%.

**Figure 8. Parameter analysis of cross-feature space size of cross-attention method combining spatial and frequency features.**
The red-dot data point shows that when the size of the cross-feature space was 12 × 12, the mIoU reached the highest value of 73.38%.

**Figure 9. Parameter analysis of depth-wise convolutions of parallel-gated feedforward network segmentation method.**
The red-dot data point shows that when G was 1,024, the mIoU value was 73.38%.

**Figure 10. Changes in mIoU, precision and recall in ablation experiments.**

**Figure 11. The mIoU changes of semantic segmentation methods at different iters on cityscapes.**

**Figure 12. The mIoU changes of semantic segmentation methods at different iters on COCO-Stuff.**

**Figure 13. Visualization results of semantic segmentation methods in complex road scenes.**
(A) Is the original image; (B) is the segmentation image of the RTFormer; (C) is the segmentation image of the DeepLabV3+; (D) is the segmentation image of the SegFormer; (E) is the segmentation image of the FCN; (F) is the segmentation image of the PSPNet; (G) is the segmentation image of the BiseNet2; (H) is the segmentation image of the ICNet; (I) is the segmentation image of our segmentation method.

**Figure 14. Visualization results of semantic segmentation methods in simple road scenes.**
(A) Is the original image; (B) is the segmentation image of the RTFormer; (C) is the segmentation image of the DeepLabV3+; (D) is the segmentation image of the SegFormer; (E) is the segmentation image of the FCN; (F) is the segmentation image of the PSPNet; (G) is the segmentation image of the BiseNet2; (H) is the segmentation image of the ICNet; (I) is the segmentation image of our segmentation method.

See this image and copyright information in PMC

References

1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012;34(11):2274–2282. doi: 10.1109/TPAMI.2012.120. - DOI - PubMed
1. Boykov YY, Jolly M-P. Interactive graph cuts for optimal boundary and region segmentation of objects in nd images. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001; Piscataway: IEEE; 2001. pp. 105–112.
1. Caesar H, Uijlings J, Ferrari V. COCO-Stuff: thing and stuff classes in context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Piscataway: IEEE; 2018. pp. 1209–1218.
1. Chen C-FR, Fan Q, Panda R. CrossViT: cross-attention multi-scale vision transformer for image classification. Proceedings of the IEEE/CVF International Conference on Computer Vision; Piscataway: IEEE; 2021. pp. 357–366.
1. Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv preprint. - DOI

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Road surface semantic segmentation for autonomous driving

Affiliations

Road surface semantic segmentation for autonomous driving

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources