Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review
- PMID: 35056238
- PMCID: PMC8781209
- DOI: 10.3390/mi13010072
Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review
Abstract
Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc. Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions. Usually, video detection is more challenging than image detection, since video frames are often more blurry than images. Moreover, video detection often has other difficulties, such as video defocus, motion blur, part occlusion, etc. Nowadays, the video detection technology is able to implement real-time detection, or high-accurate detection of blurry video frames. In this paper, various video object and human action detection approaches are reviewed and discussed, many of them have performed state-of-the-art results. We mainly review and discuss the classic video detection methods with supervised learning. In addition, the frequently-used video object detection and human action recognition datasets are reviewed. Finally, a summarization of the video detection is represented, e.g., the video object and human action detection methods could be classified into frame-by-frame (frame-based) detection, extracting-key-frame detection and using-temporal-information detection; the methods of utilizing temporal information of adjacent video frames are mainly the optical flow method, Long Short-Term Memory and convolution among adjacent frames.
Keywords: LSTM; deep learning; human action recognition; optical flow; temporal information; video dataset; video object detection.
Conflict of interest statement
The authors declare no conflict of interest.
Figures































References
-
- Dalal N., Triggs B. Histograms of oriented gradients for human detection; Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); San Diego, CA, USA. 20–25 June 2005; pp. 886–893.
-
- Lowe D.G. Object recognition from local scale-invariant features; Proceedings of the Seventh IEEE International Conference on Computer Vision; Kerkyra, Greece. 20–27 September 1999; pp. 1150–1157.
-
- Viola P., Jones M. Rapid object detection using a boosted cascade of simple features; Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001; Kauai, HI, USA. 8–14 December 2001.
-
- Haar A. Zur theorie der orthogonalen funktionensysteme. Math. Ann. 1910;69:331–371. doi: 10.1007/BF01456326. - DOI
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources