Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 11;20(4):956.
doi: 10.3390/s20040956.

Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor

Affiliations

Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor

Shuo Chang et al. Sensors (Basel). .

Abstract

For autonomous driving, it is important to detect obstacles in all scales accurately for safety consideration. In this paper, we propose a new spatial attention fusion (SAF) method for obstacle detection using mmWave radar and vision sensor, where the sparsity of radar points are considered in the proposed SAF. The proposed fusion method can be embedded in the feature-extraction stage, which leverages the features of mmWave radar and vision sensor effectively. Based on the SAF, an attention weight matrix is generated to fuse the vision features, which is different from the concatenation fusion and element-wise add fusion. Moreover, the proposed SAF can be trained by an end-to-end manner incorporated with the recent deep learning object detection framework. In addition, we build a generation model, which converts radar points to radar images for neural network training. Numerical results suggest that the newly developed fusion method achieves superior performance in public benchmarking. In addition, the source code will be released in the GitHub.

Keywords: Autonomous Driving; MmWave Radar; Obstacle Detection; Spatial Attention Fusion; Vision.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The rendering results in the camera coordinate about radar points (top row) and LiDAR points (bottom row). As shown in the figure, the return number of radar points is sparse compared with the LiDAR points. In addition, points in the camera coordinate with different colors represent different depth values.
Figure 2
Figure 2
Three different fusion schemes using mmWave radar and vision sensor for obstacle detection.
Figure 3
Figure 3
The annotations of front camera in nuScenes dataset [37]. Top row: the original annotations provided by the nuScenes, which are 3D bounding boxes colored by black. Middle row: the generated 2D annotations by converting the 3D bounding boxes. Bottom row: the 2D annotations generated by our proposed method.
Figure 4
Figure 4
The radar image generation model.
Figure 5
Figure 5
The two kinds of rendering cases involved in rendering process of radar image generation model.
Figure 6
Figure 6
The proposed spatial attention fusion-based fully convolutional one-stage network (SAF-FCOS) for obstacle detection. In addition, the FCOS-Pre. block stands for the prediction head used in FCOS detection framework. The P3 stands for Phase 3.
Figure 7
Figure 7
Different fusion blocks in feature fusion scheme. From the left to right: Multiply Fusion (MUL) Block, Element-Wise Add Fusion (ADD) Block, Concatenation Fusion (CAT) Block and Spatial Attention Fusion (SAF) Block.
Figure 8
Figure 8
The visualization results about part of radar feature, part of vision feature, spatial attention matrix and part of fusion feature in SAF-FCOS. The channels of feature maps about the radar feature, vision feature and fusion feature are all 256. Due to space constraints, we only select 16 of them for visualization.
Figure 9
Figure 9
The detection performance comparison between FCOS [10] and SAF-FCOS in sunny, rainy and night. The images in the top row are detection results by FCOS, and the images in the bottom row are detection results by SAF-FCOS. The comparisons demonstrate that the proposed SAF-FCOS has a better performance in small and far away obstacles.
Figure 10
Figure 10
The loss and AP plots about FCOS and SAF-FCOS in different iterations.

References

    1. Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2016); Las Vegas, NV, USA. 27–30 June 2016; pp. 779–788.
    1. Liu W., Anguelov D., Erhan D. SSD: Single shot multibox detector; Proceedings of the European Conference on Computer Vision Workshops (ECCV 2016); Amsterdam, The Netherlands. 11–14 October 2016; pp. 21–37.
    1. Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2014); Columbus, OH, USA. 23–28 June 2014; pp. 580–587.
    1. Girshick R. Fast R-CNN; Proceedings of the IEEE International Conference on Computer Vision (ICCV2015); Santiago, Chile. 7–13 December 2015; pp. 1440–1448.
    1. He K.M., Zhang X.Y., Ren S.Q., Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015;37:9:1904–9:1916. doi: 10.1109/TPAMI.2015.2389824. - DOI - PubMed

LinkOut - more resources