. 2023 Apr 10;23(8):3853.

doi: 10.3390/s23083853.

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

Shuqi Fang¹, Bin Zhang¹, Jingyu Hu¹

Affiliations

PMID: 37112194
PMCID: PMC10146362
DOI: 10.3390/s23083853

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

Shuqi Fang et al. Sensors (Basel). 2023.

. 2023 Apr 10;23(8):3853.

doi: 10.3390/s23083853.

Authors

Shuqi Fang¹, Bin Zhang¹, Jingyu Hu¹

Affiliation

¹ School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China.

PMID: 37112194
PMCID: PMC10146362
DOI: 10.3390/s23083853

Abstract

Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in complex traffic scenes. To address this problem, this paper improved the Mask R-CNN by replacing the backbone network ResNet with the ResNeXt network with group convolution to further improve the feature extraction capability of the model. Furthermore, a bottom-up path enhancement strategy was added to the Feature Pyramid Network (FPN) to achieve feature fusion, while an efficient channel attention module (ECA) was added to the backbone feature extraction network to optimize the high-level low resolution semantic information graph. Finally, the bounding box regression loss function smooth L1 loss was replaced by CIoU loss to speed up the model convergence and minimize the error. The experimental results showed that the improved Mask R-CNN algorithm achieved 62.62% mAP for target detection and 57.58% mAP for segmentation accuracy on the publicly available CityScapes autonomous driving dataset, which were 4.73% and 3.96%% better than the original Mask R-CNN algorithm, respectively. The migration experiments showed that it has good detection and segmentation effects in each traffic scenario of the publicly available BDD autonomous driving dataset.

Keywords: CIoU; FPN; Mask R-CNN; ResNeXt; autonomous driving; efficient channel attention module; environment perception; multi-target.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Mask R-CNN model structure. Adapted from Ref. [22].

**Figure 2**
Backbone feature extraction network. Adapted from Ref. [24].

**Figure 3**
Principle of anchor generation. Adapted from Ref. [9].

**Figure 4**
Bilinear interpolation. Adapted from Ref. [13]. The orange arrow points to the center point obtained after the region has been quadratically divided.

**Figure 5**
Improved Mask R-CNN model structure: the orange part shows the improvement points proposed in this paper.

**Figure 6**
ResNet (**left**) and ResNeXt (**right**).

**Figure 7**
Feature Pyramid Network topology. Adapted from Ref. [24].

**Figure 8**
Improved FPN structure. Adapted from Ref. [29].

**Figure 9**
Structure of the Efficiency Channel Attention module. Adapted from Ref. [30].

**Figure 11**
Loss function graph for each experimental group.

**Figure 13**
Test results of Mask R-CNN on CityScapes test dataset.

**Figure 14**
Improved Mask R-CNN test results on CityScapes test dataset.

**Figure 16**
Mask R-CNN test results on BDD dataset.

**Figure 17**
Improved Mask R-CNN test results on BDD dataset.

**Figure 18**
Dark scene images in BDD dataset.

**Figure 19**
Mask R-CNN test results on dark scene images of BDD dataset.

**Figure 20**
Improved Mask R-CNN test results on dark scene images of BDD dataset.

**Figure 21**
Rain and snow scene images for BDD dataset.

**Figure 22**
Mask R-CNN test results on rain and snow scene images of BDD dataset.

**Figure 23**
Improved Mask R-CNN test results on rain and snow scene images of BDD dataset.

See this image and copyright information in PMC

Cited by

Deep learning for automated boundary detection and segmentation in organ donation photography.
Kourounis G, Elmahmudi AA, Thomson B, Nandi R, Tingle SJ, Glover EK, Thompson E, Mahendran B, Connelly C, Gibson B, Bates L, Sheerin NS, Hunter J, Ugail H, Wilson C. Kourounis G, et al. Innov Surg Sci. 2024 Aug 20:iss-2024-0022. doi: 10.1515/iss-2024-0022. Online ahead of print. Innov Surg Sci. 2024. PMID: 40568340 Free PMC article.
YOLO-SDL: a lightweight wheat grain detection technology based on an improved YOLOv8n model.
Qiu Z, Wang F, Wang W, Li T, Jin X, Qing S, Shi Y. Qiu Z, et al. Front Plant Sci. 2024 Nov 19;15:1495222. doi: 10.3389/fpls.2024.1495222. eCollection 2024. Front Plant Sci. 2024. PMID: 39634063 Free PMC article.
Optimized Design of EdgeBoard Intelligent Vehicle Based on PP-YOLOE.
Yao C, Liu X, Wang J, Cheng Y. Yao C, et al. Sensors (Basel). 2024 May 16;24(10):3180. doi: 10.3390/s24103180. Sensors (Basel). 2024. PMID: 38794034 Free PMC article.
Wind Speed Prediction Based on Error Compensation.
Jiao X, Zhang D, Wang X, Tian Y, Liu W, Xin L. Jiao X, et al. Sensors (Basel). 2023 May 19;23(10):4905. doi: 10.3390/s23104905. Sensors (Basel). 2023. PMID: 37430818 Free PMC article.
Multi-target detection and tracking based on CRF network and spatio-temporal attention for sports videos.
Chen X, Zhang H, Shankar A, Bhushan B, Joshi K. Chen X, et al. Sci Rep. 2025 Feb 25;15(1):6808. doi: 10.1038/s41598-025-89929-7. Sci Rep. 2025. PMID: 40000758 Free PMC article.

See all "Cited by" articles

References

1. Grigorescu S., Trasnea B., Cocias T., Macesanu G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2022;37:362–386. doi: 10.1002/rob.21918. - DOI
1. Janai J., Güney F., Behl A., Geiger A. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends® Comput. Graph. Vis. 2020;12:1–308. doi: 10.1561/0600000079. - DOI
1. Su L., Sun Y.-X., Yuan S.-Z. A survey of instance segmentation research based on deep learning. CAAI Trans. Intell. Syst. 2022;17:16.
1. Joseph R., Santosh D., Ross G., Ali F. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788. - DOI
1. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-F., Berg A.C. Ssd: Single shot multibox detector; Proceedings of the Computer Vision–ECCV 2016: 14th European Conference; Amsterdam, The Netherlands. 11–14 October 2016; Berlin/Heidelberg, Germany: Springer International Publishing; 2016. Part I.

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

Affiliation

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources