Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 8;21(12):3964.
doi: 10.3390/s21123964.

Transfer Learning Based Semantic Segmentation for 3D Object Detection from Point Cloud

Affiliations

Transfer Learning Based Semantic Segmentation for 3D Object Detection from Point Cloud

Muhammad Imad et al. Sensors (Basel). .

Abstract

Three-dimensional object detection utilizing LiDAR point cloud data is an indispensable part of autonomous driving perception systems. Point cloud-based 3D object detection has been a better replacement for higher accuracy than cameras during nighttime. However, most LiDAR-based 3D object methods work in a supervised manner, which means their state-of-the-art performance relies heavily on a large-scale and well-labeled dataset, while these annotated datasets could be expensive to obtain and only accessible in the limited scenario. Transfer learning is a promising approach to reduce the large-scale training datasets requirement, but existing transfer learning object detectors are primarily for 2D object detection rather than 3D. In this work, we utilize the 3D point cloud data more effectively by representing the birds-eye-view (BEV) scene and propose a transfer learning based point cloud semantic segmentation for 3D object detection. The proposed model minimizes the need for large-scale training datasets and consequently reduces the training time. First, a preprocessing stage filters the raw point cloud data to a BEV map within a specific field of view. Second, the transfer learning stage uses knowledge from the previously learned classification task (with more data for training) and generalizes the semantic segmentation-based 2D object detection task. Finally, 2D detection results from the BEV image have been back-projected into 3D in the postprocessing stage. We verify results on two datasets: the KITTI 3D object detection dataset and the Ouster LiDAR-64 dataset, thus demonstrating that the proposed method is highly competitive in terms of mean average precision (mAP up to 70%) while still running at more than 30 frames per second (FPS).

Keywords: 3D object detection; point cloud processing; semantic segmentation; transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of the proposed 3D object detection architecture. The proposed model directly utilizes LiDAR-based birds-eye-view (BEV) images to estimate and localize 3D bounding volumes. The whole pipeline consists of a preprocessing module, deep learning, and back projection module.
Figure 2
Figure 2
Schematic representation of the birds-eye-view RGB map.
Figure 3
Figure 3
Transfer learning from classification to segmentation.
Figure 4
Figure 4
Schematic illustration of an encoder-decoder architecture. The left-hand side is a birds-eye-view RGB map that is passed to a series of computational layers, and the right-hand side is the output decoder feature map. The arrows are skip connection layers, where input is being directly concatenated from encoder to decoder.
Figure 5
Figure 5
Comparison between model from scratch and model initialized with pretrained classification weights.
Figure 6
Figure 6
Comparison between model from scratch and model initialized with classification pretrained weights: (a,b) shows prediction using pretrained weights, (c,d) shows prediction using the model trained from scratch.
Figure 7
Figure 7
The visualization results on KITTI dataset using our proposed method. Subfigures (ad) shows the ground truth on the right-hand side, and the left-hand side shows the output results. The images shows that the proposed model performs well in different scenarios.
Figure 8
Figure 8
(ad) shows samples of prediction on the left-hand side and extracted contours for the car class on the right-hand side. Subfigures (a,b) shows the simple scenarios, and (c,d) shows the images where the proposed model is able to achieve accurate results in more complex scenarios with rotated bounding boxes.
Figure 9
Figure 9
Qualitative results of the proposed model using KITTI 3D object detection dataset in LiDAR frame.
Figure 10
Figure 10
Qualitative results of the proposed model using the Ouster LiDAR-64 dataset in LiDAR frame.
Figure 11
Figure 11
Performance comparison. This plot shows the mean average precision (mAP) against the run-time (FPS) in the LiDAR frame. We compare our proposed model with the existing model for 3D object detection and measured our architecture performance on a dedicated embedded platform (Intel PC) with real-time efficiency.

Similar articles

Cited by

References

    1. Himmelsbach M., Mueller A., Lüttel T., Wünsche H.J. LIDAR-based 3D object perception; Proceedings of the 1st International Workshop on Cognition for Technical Systems; Munich, Germany. 6–8 October 2008;
    1. Sualeh M., Kim G.W. Dynamic multi-lidar based multiple object detection and tracking. Sensors. 2019;19:1474. doi: 10.3390/s19061474. - DOI - PMC - PubMed
    1. Jung J., Bae S.H. Real-time road lane detection in urban areas using LiDAR data. Electronics. 2018;7:276. doi: 10.3390/electronics7110276. - DOI
    1. Kuang H., Wang B., An J., Zhang M., Zhang Z. Voxel-FPN: Multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds. Sensors. 2020;20:704. doi: 10.3390/s20030704. - DOI - PMC - PubMed
    1. Wang L., Li R., Sun J., Liu X., Zhao L., Seah H.S., Quah C.K., Tandianus B. Multi-View Fusion-Based 3D Object Detection for Robot Indoor Scene Perception. Sensors. 2019;19:4092. doi: 10.3390/s19194092. - DOI - PMC - PubMed

LinkOut - more resources