Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 10;23(8):3853.
doi: 10.3390/s23083853.

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

Affiliations

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

Shuqi Fang et al. Sensors (Basel). .

Abstract

Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in complex traffic scenes. To address this problem, this paper improved the Mask R-CNN by replacing the backbone network ResNet with the ResNeXt network with group convolution to further improve the feature extraction capability of the model. Furthermore, a bottom-up path enhancement strategy was added to the Feature Pyramid Network (FPN) to achieve feature fusion, while an efficient channel attention module (ECA) was added to the backbone feature extraction network to optimize the high-level low resolution semantic information graph. Finally, the bounding box regression loss function smooth L1 loss was replaced by CIoU loss to speed up the model convergence and minimize the error. The experimental results showed that the improved Mask R-CNN algorithm achieved 62.62% mAP for target detection and 57.58% mAP for segmentation accuracy on the publicly available CityScapes autonomous driving dataset, which were 4.73% and 3.96%% better than the original Mask R-CNN algorithm, respectively. The migration experiments showed that it has good detection and segmentation effects in each traffic scenario of the publicly available BDD autonomous driving dataset.

Keywords: CIoU; FPN; Mask R-CNN; ResNeXt; autonomous driving; efficient channel attention module; environment perception; multi-target.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Mask R-CNN model structure. Adapted from Ref. [22].
Figure 2
Figure 2
Backbone feature extraction network. Adapted from Ref. [24].
Figure 3
Figure 3
Principle of anchor generation. Adapted from Ref. [9].
Figure 4
Figure 4
Bilinear interpolation. Adapted from Ref. [13]. The orange arrow points to the center point obtained after the region has been quadratically divided.
Figure 5
Figure 5
Improved Mask R-CNN model structure: the orange part shows the improvement points proposed in this paper.
Figure 6
Figure 6
ResNet (left) and ResNeXt (right).
Figure 7
Figure 7
Feature Pyramid Network topology. Adapted from Ref. [24].
Figure 8
Figure 8
Improved FPN structure. Adapted from Ref. [29].
Figure 9
Figure 9
Structure of the Efficiency Channel Attention module. Adapted from Ref. [30].
Figure 10
Figure 10
CityScapes dataset.
Figure 11
Figure 11
Loss function graph for each experimental group.
Figure 12
Figure 12
CityScapes test dataset.
Figure 13
Figure 13
Test results of Mask R-CNN on CityScapes test dataset.
Figure 14
Figure 14
Improved Mask R-CNN test results on CityScapes test dataset.
Figure 15
Figure 15
BDD test dataset.
Figure 16
Figure 16
Mask R-CNN test results on BDD dataset.
Figure 17
Figure 17
Improved Mask R-CNN test results on BDD dataset.
Figure 18
Figure 18
Dark scene images in BDD dataset.
Figure 19
Figure 19
Mask R-CNN test results on dark scene images of BDD dataset.
Figure 20
Figure 20
Improved Mask R-CNN test results on dark scene images of BDD dataset.
Figure 21
Figure 21
Rain and snow scene images for BDD dataset.
Figure 22
Figure 22
Mask R-CNN test results on rain and snow scene images of BDD dataset.
Figure 23
Figure 23
Improved Mask R-CNN test results on rain and snow scene images of BDD dataset.

Similar articles

Cited by

References

    1. Grigorescu S., Trasnea B., Cocias T., Macesanu G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2022;37:362–386. doi: 10.1002/rob.21918. - DOI
    1. Janai J., Güney F., Behl A., Geiger A. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends® Comput. Graph. Vis. 2020;12:1–308. doi: 10.1561/0600000079. - DOI
    1. Su L., Sun Y.-X., Yuan S.-Z. A survey of instance segmentation research based on deep learning. CAAI Trans. Intell. Syst. 2022;17:16.
    1. Joseph R., Santosh D., Ross G., Ali F. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788. - DOI
    1. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-F., Berg A.C. Ssd: Single shot multibox detector; Proceedings of the Computer Vision–ECCV 2016: 14th European Conference; Amsterdam, The Netherlands. 11–14 October 2016; Berlin/Heidelberg, Germany: Springer International Publishing; 2016. Part I.

LinkOut - more resources