Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 12;21(14):4755.
doi: 10.3390/s21144755.

Object Detection and Depth Estimation Approach Based on Deep Convolutional Neural Networks

Affiliations

Object Detection and Depth Estimation Approach Based on Deep Convolutional Neural Networks

Huai-Mu Wang et al. Sensors (Basel). .

Abstract

In this paper, we present a real-time object detection and depth estimation approach based on deep convolutional neural networks (CNNs). We improve object detection through the incorporation of transfer connection blocks (TCBs), in particular, to detect small objects in real time. For depth estimation, we introduce binocular vision to the monocular-based disparity estimation network, and the epipolar constraint is used to improve prediction accuracy. Finally, we integrate the two-dimensional (2D) location of the detected object with the depth information to achieve real-time detection and depth estimation. The results demonstrate that the proposed approach achieves better results compared to conventional methods.

Keywords: deep learning; depth estimation; object detection; stereo vision.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Flowchart of the proposed approach.
Figure 2
Figure 2
Flowchart of the improved TCB used for the modified RefineDet.
Figure 3
Figure 3
TCB block model with incorporated squeeze-and-excitation flow.
Figure 4
Figure 4
Modified network structure with binocular images. Stereo image pairs are used to generate the disparity maps.
Figure 5
Figure 5
Comparison of the object detection results. (a) The resulting image is obtained from the original RefineDet. (b) The resulting image is the output of our approach.
Figure 6
Figure 6
Reducing the number of classes to speed up object detection.
Figure 7
Figure 7
mAP comparison of the original RefineDet and our approach tested on the KITTI dataset. (a) Evaluation of RefineDet. (b) Evaluation of our approach.
Figure 8
Figure 8
mAP comparison of the original RefineDet and our approach tested on the BDD100K dataset. (a) Evaluation of RefineDet. (b) Evaluation of our approach.
Figure 9
Figure 9
mAP comparison of the original RefineDet and our approach tested on our own dataset. (a) Evaluation of RefineDet. (b) Evaluation of our approach.
Figure 10
Figure 10
A stereo image pair and the estimated disparity map. (a) left image; (b) right image; (c) estimated disparity map.
Figure 11
Figure 11
Three results of the depth map prediction using the KITTI dataset. (a) Traffic scene 1 with an approaching vehicle. (b) Traffic scene 2 with faraway small vehicles. (c) Traffic scene 3 with a vehicle for roadside parking.
Figure 12
Figure 12
Disparity maps of two stereo image pairs generated using our approach (bottom left) display clear improvements over the original lightweight network (bottom right). (a) Scene 1. (b) Scene 2.

References

    1. Zhao Q., Sheng T., Wang Y., Ni F., Cai L. Cfenet: An accurate and efficient single-shot object detector for autonomous driving. arXiv. 20181806.09790
    1. Li Y.F., Tsai C.C., Lai Y.T., Guo J.I. A multiple-lane vehicle tracking method for forward collision warning system applications; Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPAASC); Kuala Lumpur, Malaysia. 12–15 December 2017; pp. 1061–1064.
    1. Naghavi S.H., Avaznia C., Talebi H. Integrated real-time object detection for self-driving vehicles; Proceedings of the 2017 10th Iranian Conference on Machine Vision and Image Processing (MVIP); Isfahan, Iran. 22–23 November 2017; pp. 154–158.
    1. Felzenszwalb P.F., McAllester D.A., Ramanan D. A discriminatively trained, multiscale, deformable part model; Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008); Anchorage, AK, USA. 24–26 June 2008.
    1. Lai C., Lin H., Tai W. Vision based ADAS for forward vehicle detection using convolutional neural networks and motion tracking; Proceedings of the 5th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2019); Heraklion, Crete, Greece. 3–5 May 2019; pp. 297–304.

MeSH terms

LinkOut - more resources