Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 18;20(2):532.
doi: 10.3390/s20020532.

Deep Learning for Real-Time 3D Multi-Object Detection, Localisation, and Tracking: Application to Smart Mobility

Affiliations

Deep Learning for Real-Time 3D Multi-Object Detection, Localisation, and Tracking: Application to Smart Mobility

Antoine Mauri et al. Sensors (Basel). .

Abstract

In core computer vision tasks, we have witnessed significant advances in object detection, localisation and tracking. However, there are currently no methods to detect, localize and track objects in road environments, and taking into account real-time constraints. In this paper, our objective is to develop a deep learning multi object detection and tracking technique applied to road smart mobility. Firstly, we propose an effective detector-based on YOLOv3 which we adapt to our context. Subsequently, to localize successfully the detected objects, we put forward an adaptive method aiming to extract 3D information, i.e., depth maps. To do so, a comparative study is carried out taking into account two approaches: Monodepth2 for monocular vision and MADNEt for stereoscopic vision. These approaches are then evaluated over datasets containing depth information in order to discern the best solution that performs better in real-time conditions. Object tracking is necessary in order to mitigate the risks of collisions. Unlike traditional tracking approaches which require target initialization beforehand, our approach consists of using information from object detection and distance estimation to initialize targets and to track them later. Expressly, we propose here to improve SORT approach for 3D object tracking. We introduce an extended Kalman filter to better estimate the position of objects. Extensive experiments carried out on KITTI dataset prove that our proposal outperforms state-of-the-art approches.

Keywords: 3D multi-object; deep learning; distance estimation; localisation; object detection; smart mobility; tracking.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of the proposed system composed of three main components: object detection, depth estimation and localisation and tracking.
Figure 2
Figure 2
Mask-RCNN results (returns the mask of each detected object with their class and their confidence score).
Figure 3
Figure 3
Result of YOLOv3 [2] (Returns the position, class and score of trust of each detected object).
Figure 4
Figure 4
Performance comparison between YOLOv3 (left) and SSD (right) detectors.
Figure 5
Figure 5
Results of disparity maps obtained using monocular approaches: (a) original image; (b) MonoResMatch; (c) SfmLearner; (d) Monodepth and (e) Monodepth2.
Figure 6
Figure 6
Results of disparity maps obtained using stereoscopic approaches: (a) original image; (b) stereo-baseline approach; (c) stereo-WLS filter and (d) MADNet.
Figure 7
Figure 7
RMSE error over a sequence of arround 900 frames from the KITTI dataset. The blue, orange, green and red curves correspond respectively to the results of monodepth, monodepth2, MonoResMatch and sfmLearner approaches.
Figure 8
Figure 8
RMSE error over a sequence of around 900 frames from the KITTI dataset. The blue, orange and green curves correspond respectively to the results of Stereo-WLS Filter, Stereo-baseline and MADNet approaches.
Figure 9
Figure 9
Object detection and localisation results over a sample from the KITTI dataset.
Figure 10
Figure 10
Result of the tracking approach over an indoor scene using the stereo images provided by the Intel RealSense D435 sensor. The left frame is acquired at instant t=1 and the right one at the instant t+3. At t=1, we assign an ID to each detected object. Then, at t+3, the tracklet is validated and we display the estimated speed.
Figure 11
Figure 11
Tracking results in the road environment.
Figure 12
Figure 12
Tracking results over a sequence from the KITTI dataset. On top, four RGB frames coming from a road sequence acquired at different times with the corresponding tacking boxes of the moving objects and their speed values. On bottom, the maps shown are proposed by this work in order to generate a synthetic tool that allows simple yet comprehensive results overview of motions and distances of the tracked objects.

References

    1. Mukojima H., Deguchi D., Kawanishi Y., Ide I., Murase H., Ukai M., Nagamine N., Nakasone R. Moving camera background-subtraction for obstacle detection on railway tracks; Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP); Phoenix, AZ, USA. 25–28 September 2016; pp. 3967–3971.
    1. Yanan S., Hui Z., Li L., Hang Z. Rail Surface Defect Detection Method Based on YOLOv3 Deep Learning Networks; Proceedings of the 2018 IEEE Chinese Automation Congress (CAC); Xi’an, China. 30 November–2 December 2018; pp. 1563–1568.
    1. Khemmar R., Gouveia M., Decoux B., Ertaud J.Y. Real Time Pedestrian and Object Detection and Tracking-based Deep Learning. Application to Drone Visual Tracking; Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision; Plzen, Czechia. 18–22 May 2019.
    1. Chen Z., Khemmar R., Decoux B., Atahouet A., Ertaud J.Y. Real Time Object Detection, Tracking, and Distance and Motion Estimation based on Deep Learning: Application to Smart Mobility; Proceedings of the 2019 Eighth International Conference on Emerging Security Technologies (EST); Colchester, UK. 22–24 July 2019; pp. 1–6.
    1. Yang S., Baum M. Extended Kalman filter for extended object tracking; Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); New Orleans, LA, USA. 5–9 March 2017; pp. 4386–4390.

LinkOut - more resources