Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 25:10:e2250.
doi: 10.7717/peerj-cs.2250. eCollection 2024.

Road surface semantic segmentation for autonomous driving

Affiliations

Road surface semantic segmentation for autonomous driving

Huaqi Zhao et al. PeerJ Comput Sci. .

Abstract

Although semantic segmentation is widely employed in autonomous driving, its performance in segmenting road surfaces falls short in complex traffic environments. This study proposes a frequency-based semantic segmentation with a transformer (FSSFormer) based on the sensitivity of semantic segmentation to frequency information. Specifically, we propose a weight-sharing factorized attention to select important frequency features that can improve the segmentation performance of overlapping targets. Moreover, to address boundary information loss, we used a cross-attention method combining spatial and frequency features to obtain further detailed pixel information. To improve the segmentation accuracy in complex road scenarios, we adopted a parallel-gated feedforward network segmentation method to encode the position information. Extensive experiments demonstrate that the mIoU of FSSFormer increased by 2% compared with existing segmentation methods on the Cityscapes dataset.

Keywords: Cross-attention combining spatial and frequency features; Parallel-gated feedforward network; Semantic segmentation; Transformer; Weight-sharing factorized attention.

PubMed Disclaimer

Conflict of interest statement

Rui Wang is employed by Dongfeng District People’s Court.

Figures

Figure 1
Figure 1. An example of a complex urban scene.
This image is taken by our team.
Figure 2
Figure 2. The architecture of proposed segmentation method.
Figure 3
Figure 3. The structure of the dynamic frequency capture module.
Figure 4
Figure 4. The structure of the cross-attention combining spatial and frequency features.
Figure 5
Figure 5. The structure of the linear attention operator.
Figure 6
Figure 6. The structure of the parallel-gated feedforward network module.
Figure 7
Figure 7. Parameter analysis of the group number M of important frequency feature extraction method.
The red-dot data point shows that when the number of groups of low-frequency capture kernels was four, the mIoU was the highest, reaching 73.38%.
Figure 8
Figure 8. Parameter analysis of cross-feature space size of cross-attention method combining spatial and frequency features.
The red-dot data point shows that when the size of the cross-feature space was 12 × 12, the mIoU reached the highest value of 73.38%.
Figure 9
Figure 9. Parameter analysis of depth-wise convolutions of parallel-gated feedforward network segmentation method.
The red-dot data point shows that when G was 1,024, the mIoU value was 73.38%.
Figure 10
Figure 10. Changes in mIoU, precision and recall in ablation experiments.
Figure 11
Figure 11. The mIoU changes of semantic segmentation methods at different iters on cityscapes.
Figure 12
Figure 12. The mIoU changes of semantic segmentation methods at different iters on COCO-Stuff.
Figure 13
Figure 13. Visualization results of semantic segmentation methods in complex road scenes.
(A) Is the original image; (B) is the segmentation image of the RTFormer; (C) is the segmentation image of the DeepLabV3+; (D) is the segmentation image of the SegFormer; (E) is the segmentation image of the FCN; (F) is the segmentation image of the PSPNet; (G) is the segmentation image of the BiseNet2; (H) is the segmentation image of the ICNet; (I) is the segmentation image of our segmentation method.
Figure 14
Figure 14. Visualization results of semantic segmentation methods in simple road scenes.
(A) Is the original image; (B) is the segmentation image of the RTFormer; (C) is the segmentation image of the DeepLabV3+; (D) is the segmentation image of the SegFormer; (E) is the segmentation image of the FCN; (F) is the segmentation image of the PSPNet; (G) is the segmentation image of the BiseNet2; (H) is the segmentation image of the ICNet; (I) is the segmentation image of our segmentation method.

References

    1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012;34(11):2274–2282. doi: 10.1109/TPAMI.2012.120. - DOI - PubMed
    1. Boykov YY, Jolly M-P. Interactive graph cuts for optimal boundary and region segmentation of objects in nd images. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001; Piscataway: IEEE; 2001. pp. 105–112.
    1. Caesar H, Uijlings J, Ferrari V. COCO-Stuff: thing and stuff classes in context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Piscataway: IEEE; 2018. pp. 1209–1218.
    1. Chen C-FR, Fan Q, Panda R. CrossViT: cross-attention multi-scale vision transformer for image classification. Proceedings of the IEEE/CVF International Conference on Computer Vision; Piscataway: IEEE; 2021. pp. 357–366.
    1. Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv preprint. - DOI

LinkOut - more resources