Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 2;18(4):1063.
doi: 10.3390/s18041063.

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Affiliations

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Xinchuan Fu et al. Sensors (Basel). .

Abstract

The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, there are two shortcomings in MRFC which limit its accuracy. One is that the receptive field correspondence in different scales is weak. Another is that the features used are not scale invariance. In this paper, two solutions are proposed to tackle with the two shortcomings respectively. Specifically, scale-aware pooling is proposed to make a better receptive field correspondence, and soft decision tree is proposed to relive scale variance problem. When coupled with efficient sliding window classification strategy, our detector achieves fast detecting speed at the same time with state-of-the-art accuracy.

Keywords: boosted decision tree; pedestrian detection; receptive field correspondence; scale invariance; soft decision tree.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Different multiscale detection strategies. (a) Dense image pyramid and single classifier; (b) Sparse image pyramid and single classifier; (c) Single image scale and sparse classifier pyramid; (d) Sparse image pyramid and sparse classifier pyramid; (e) Single image scale and single classifier.
Figure 2
Figure 2
Illustration of the problem of receptive field correspondence in MRFC method. (a) The yellow circles represent the receptive fields of a feature. Note for pedestrians of different scales, the area of the circle do not change, which is unreasonable; (b) The ideal circumstance is the receptive field of a feature resizes according to the scale of the pedestrian. Note the areas are different for circles of different colors.
Figure 3
Figure 3
Illustration of the feature extraction process of our method. (a) Feature maps of different scales are divided into the same number of cells whose size vary with the pedestrian size; (b) Features are extracted by average pooling in different regions which are composed of one or more cells; (c) Top: Pooling in feature gradient maps is equivalent to computing difference of two shifted pooling regions, which has similar effect with Non-Neighboring Features (NNF). Down: Some discriminative features in DICs.
Figure 4
Figure 4
Acceleration strategy for computing integral maps. (a) Naive approach. Every feature maps need to be integrated; (b) Our method. Only the original 10 feature maps need to be integrated.
Figure 5
Figure 5
Comparison between hard decision trees and soft decision trees. The blue, yellow and green nodes denote hard decision node, soft decision node and leaf nodes respectively. The red arrows denote the flow of sample weights. (a) The hard decision tree is composed of hard decision nodes and leaf nodes. For a given sample, the hard decision node direct all its weight to one of its children; (b) The root node of the soft decision tree is a soft decision node which directs the sample weight to both its children according to the sample size. Given a large sample, the soft decision node directs more weight to its left branch. Note the arrow of the left branch is thicker than the arrow of the right branch; (c) Another example of the soft decision tree with a small sample. The soft decision node directs more weight to its right branch.
Figure 6
Figure 6
Illustration of GPC. (a) A pedestrian may be bounded by the green boxes, but may not be bounded by the red boxes; (b) (h,y)s of the pedestrian windows in the Caltech training set. They can be bounded by two straight lines.
Figure 7
Figure 7
The sparse grid detection strategy. We begin by evaluate only a sparse grid (x1,x2,x3,x4). Suppose P is a peak score window and its ROS is represented by the red dash line circle. Window x1 is in the ROS, thus it will passes k stages of the BDT cascade and every window in its 3×3 neighbourhood is triggered (yellow circles).
Figure 8
Figure 8
Comparison with non deep learning methods on the KITTI dataset. Our method does not achieve the highest precision for the whole recall range, but based on AP, our method outperforms the other methods.
Figure 9
Figure 9
Comparison with the top methods and single scale feature maps-based methods on the Caltech dataset. Our method achieves the lowest miss rate for the whole FPPI range in all the non deep learning methods.
Figure 10
Figure 10
Evaluation results under conditions of small scale, atypical aspect ratio and partial occlusion. (a) Small scale (50 pxh80 px); (b) Atypical aspect ratio (|w/h0.41|0.1); (c) Patial occlusion (0–35% occluded).
Figure 11
Figure 11
MR versus FPS on the Caltech Dataset.

References

    1. Dollár P., Wojek C., Schiele B., Perona P. Pedestrian detection: A benchmark; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009); Miami, FL, USA. 20–25 June 2009; pp. 304–311.
    1. Geiger A., Lenz P., Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Providence, RI, USA. 16–21 June 2012; pp. 3354–3361.
    1. Dalal N., Triggs B. Histograms of Oriented Gradients for Human Detection; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005); San Diego, CA, USA. 20–26 June 2005; pp. 886–893.
    1. Ess A., Leibe B., Schindler K., Gool L.J.V. A mobile vision system for robust multi-person tracking; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008); Anchorage, AK, USA. 24–26 June 2008.
    1. Dollár P., Appel R., Belongie S.J., Perona P. Fast Feature Pyramids for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014;36:1532–1545. doi: 10.1109/TPAMI.2014.2300479. - DOI - PubMed