Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 15;22(16):6090.
doi: 10.3390/s22166090.

EVtracker: An Event-Driven Spatiotemporal Method for Dynamic Object Tracking

Affiliations

EVtracker: An Event-Driven Spatiotemporal Method for Dynamic Object Tracking

Shixiong Zhang et al. Sensors (Basel). .

Abstract

An event camera is a novel bio-inspired sensor that effectively compensates for the shortcomings of current frame cameras, which include high latency, low dynamic range, motion blur, etc. Rather than capturing images at a fixed frame rate, an event camera produces an asynchronous signal by measuring the brightness change of each pixel. Consequently, an appropriate algorithm framework that can handle the unique data types of event-based vision is required. In this paper, we propose a dynamic object tracking framework using an event camera to achieve long-term stable tracking of event objects. One of the key novel features of our approach is to adopt an adaptive strategy that adjusts the spatiotemporal domain of event data. To achieve this, we reconstruct event images from high-speed asynchronous streaming data via online learning. Additionally, we apply the Siamese network to extract features from event data. In contrast to earlier models that only extract hand-crafted features, our method provides powerful feature description and a more flexible reconstruction strategy for event data. We assess our algorithm in three challenging scenarios: 6-DoF (six degrees of freedom), translation, and rotation. Unlike fixed cameras in traditional object tracking tasks, all three tracking scenarios involve the simultaneous violent rotation and shaking of both the camera and objects. Results from extensive experiments suggest that our proposed approach achieves superior accuracy and robustness compared to other state-of-the-art methods. Without reducing time efficiency, our novel method exhibits a 30% increase in accuracy over other recent models. Furthermore, results indicate that event cameras are capable of robust object tracking, which is a task that conventional cameras cannot adequately perform, especially for super-fast motion tracking and challenging lighting situations.

Keywords: event-based camera; object tracking; spatiotemporal method.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
This picture shows how the event camera captures event data. When a new event is triggered, the event camera will only update the coordinates of the activation point instead of the full image. Event data generally include four parts: timestamp, coordinate information (x, y), and polarity. We can observe from the figure that each event data point is asynchronous. Event data is dense in the temporal domain and sparse in the spatial domain. (Data source adapted with permission from [10]. December 2020, Elsevier).
Figure 2
Figure 2
Overview of our reconstruction strategy of the event image, Event streams represent an event flow of asynchronous data that is updated on a time series. The event frame indicates the candidate event image. The CNN indicates a shallow convolutional neural network to extract features. By comparing features, a higher heatmap means that the quality of the features is better and a better event image can be output. In this case, PN provides higher-quality event images with better texture features.
Figure 3
Figure 3
This picture shows the real system using our proposed framework, event camera output event flows, and a dynamic adaptive strategy to reconstruct the event stream into an event image in a time series. Our tracking algorithm tracks the object in the event image sequence. When the tracking fails, the detector is used to reinitialize the tracker.
Figure 4
Figure 4
This picture shows an event camera-based tracker using the Siamese network, with feature extraction of the target image and search image using a shared weight network.
Figure 5
Figure 5
(a) shows the original image when the fan is stationary, (b) shows the image taken with an RGB camera; when the fan rotates at high speed, the picture can no longer be observed due to motion blur. (c) shows the event data taken with an event camera when the fan rotates at high speed; we visualize event data for comparison and a clear target structure without background is captured. It can be observed from this comparison diagram that the event camera can effectively avoid motion blur.
Figure 6
Figure 6
Tracking objects of different shapes and sizes. Each row shows the tracking of a single object; (ad) show the tracking results of the book; (eh) show the tracking results of the cup; (il) show the tracking results of the drone. It is shown here that our method can overcome the challenges of dramatic scene changes, object out of view, etc., and has good robustness.
Figure 7
Figure 7
We compared our proposed EVtracker and e-TLD on challenging event data. (a,c) show the visualization results of e-TLD; (b,d) show the visualization results of our EVtracker. Our method can balance event resolution and spatial resolution of event data well, and the tracking results of our method are more stable and robust than the results of e-TLD. The white and green boxes represent the bounding box of the tracking.
Figure 7
Figure 7
We compared our proposed EVtracker and e-TLD on challenging event data. (a,c) show the visualization results of e-TLD; (b,d) show the visualization results of our EVtracker. Our method can balance event resolution and spatial resolution of event data well, and the tracking results of our method are more stable and robust than the results of e-TLD. The white and green boxes represent the bounding box of the tracking.
Figure 8
Figure 8
Here we use three strategies to reconstruct the event image. (a) shows the event frame reconstructed using the strategy of [28], and (b) shows the event image reconstructed using the strategy of [29]. (c) shows the event images reconstructed using our proposed strategy. In terms of visual comparison, the accumulation of too many event pixels in event image (a) causes the object to be blurred, while the event image (b) is too sparse to observe the object. Our event image has a clear object structure and edges. The green box represents the tracking object initialized in the first frame.
Figure 9
Figure 9
Here, we use three sets of visualized result images for analysis and comparison. (ac) show the visualization results of using time windows to gather event data. (df) show the visualization results of gathering event data under the condition of a fixed number of event pixels. (gi) show the visualization results of using our dynamic adaptive strategy to gather event data and object tracking. The green box represents the bounding box of the tracking.

References

    1. Mitrokhin A., Fermüller C., Parameshwara C., Aloimonos Y. Event-based moving object detection and tracking; Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE; Madrid, Spain. 1–5 October 2018; pp. 1–9.
    1. Chen G., Cao H., Conradt J., Tang H., Rohrbein F., Knoll A. Event-based neuromorphic vision for autonomous driving: A paradigm shift for bio-inspired visual sensing and perception. IEEE Signal Process. Mag. 2020;37:34–49. doi: 10.1109/MSP.2020.2985815. - DOI
    1. Gehrig M., Shrestha S.B., Mouritzen D., Scaramuzza D. Event-based angular velocity regression with spiking networks; Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE; Paris, France. 31 May–31 August 2020; pp. 4195–4202.
    1. Deng Y., Chen H., Li Y. MVF-Net: A Multi-view Fusion Network for Event-based Object Classification. IEEE Transactions on Circuits and Systems for Video Technology. IEEE; Piscataway, NJ, USA: 2021.
    1. Gallego G., Delbruck T., Orchard G.M., Bartolozzi C., Taba B., Censi A., Leutenegger S., Davison A., Conradt J., Daniilidis K., et al. IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE; Piscataway, NJ, USA: 2020. Event-based Vision: A Survey; p. 1. - DOI - PubMed

LinkOut - more resources