. 2018 Apr 2;18(4):1063.

doi: 10.3390/s18041063.

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Xinchuan Fu¹, Rui Yu², Weinan Zhang³, Jie Wu⁴, Shihai Shao⁵

Affiliations

¹ National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China. 201311010310@std.uestc.edu.cn.
² Department of Computer Science, University College London, London WC1E 6BT, UK. r.yu@cs.ucl.ac.uk.
³ Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai 200240, China. wnzhang@sjtu.edu.cn.
⁴ Department of MOE Research Center for Software/Hardware Co-Design Engineering and Application, East China Normal University, Shanghai 200062, China. 52151500020@stu.ecnu.edu.cn.
⁵ National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China. ssh@uestc.edu.cn.

PMID: 29614807
PMCID: PMC5948919
DOI: 10.3390/s18041063

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Xinchuan Fu et al. Sensors (Basel). 2018.

. 2018 Apr 2;18(4):1063.

doi: 10.3390/s18041063.

Authors

Xinchuan Fu¹, Rui Yu², Weinan Zhang³, Jie Wu⁴, Shihai Shao⁵

Affiliations

¹ National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China. 201311010310@std.uestc.edu.cn.
² Department of Computer Science, University College London, London WC1E 6BT, UK. r.yu@cs.ucl.ac.uk.
³ Department of Computer Science & Engineering, Shanghai Jiao Tong University, Shanghai 200240, China. wnzhang@sjtu.edu.cn.
⁴ Department of MOE Research Center for Software/Hardware Co-Design Engineering and Application, East China Normal University, Shanghai 200062, China. 52151500020@stu.ecnu.edu.cn.
⁵ National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China. ssh@uestc.edu.cn.

PMID: 29614807
PMCID: PMC5948919
DOI: 10.3390/s18041063

Abstract

The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, there are two shortcomings in MRFC which limit its accuracy. One is that the receptive field correspondence in different scales is weak. Another is that the features used are not scale invariance. In this paper, two solutions are proposed to tackle with the two shortcomings respectively. Specifically, scale-aware pooling is proposed to make a better receptive field correspondence, and soft decision tree is proposed to relive scale variance problem. When coupled with efficient sliding window classification strategy, our detector achieves fast detecting speed at the same time with state-of-the-art accuracy.

Keywords: boosted decision tree; pedestrian detection; receptive field correspondence; scale invariance; soft decision tree.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Different multiscale detection strategies. (a) Dense image pyramid and single classifier; (b) Sparse image pyramid and single classifier; (c) Single image scale and sparse classifier pyramid; (d) Sparse image pyramid and sparse classifier pyramid; (e) Single image scale and single classifier.

**Figure 2**
Illustration of the problem of receptive field correspondence in MRFC method. (a) The yellow circles represent the receptive fields of a feature. Note for pedestrians of different scales, the area of the circle do not change, which is unreasonable; (b) The ideal circumstance is the receptive field of a feature resizes according to the scale of the pedestrian. Note the areas are different for circles of different colors.

**Figure 3**
Illustration of the feature extraction process of our method. (a) Feature maps of different scales are divided into the same number of cells whose size vary with the pedestrian size; (b) Features are extracted by average pooling in different regions which are composed of one or more cells; (c) Top: Pooling in feature gradient maps is equivalent to computing difference of two shifted pooling regions, which has similar effect with Non-Neighboring Features (NNF). Down: Some discriminative features in DICs.

**Figure 4**
Acceleration strategy for computing integral maps. (a) Naive approach. Every feature maps need to be integrated; (b) Our method. Only the original 10 feature maps need to be integrated.

**Figure 5**
Comparison between hard decision trees and soft decision trees. The blue, yellow and green nodes denote hard decision node, soft decision node and leaf nodes respectively. The red arrows denote the flow of sample weights. (a) The hard decision tree is composed of hard decision nodes and leaf nodes. For a given sample, the hard decision node direct all its weight to one of its children; (b) The root node of the soft decision tree is a soft decision node which directs the sample weight to both its children according to the sample size. Given a large sample, the soft decision node directs more weight to its left branch. Note the arrow of the left branch is thicker than the arrow of the right branch; (c) Another example of the soft decision tree with a small sample. The soft decision node directs more weight to its right branch.

**Figure 6**
Illustration of GPC. (a) A pedestrian may be bounded by the green boxes, but may not be bounded by the red boxes; (b) $(h, y)$ s of the pedestrian windows in the Caltech training set. They can be bounded by two straight lines.

**Figure 7**
The sparse grid detection strategy. We begin by evaluate only a sparse grid ( $x_{1}, x_{2}, x_{3}, x_{4}$ ). Suppose P is a peak score window and its ROS is represented by the red dash line circle. Window $x_{1}$ is in the ROS, thus it will passes k stages of the BDT cascade and every window in its $3 \times 3$ neighbourhood is triggered (yellow circles).

**Figure 8**
Comparison with non deep learning methods on the KITTI dataset. Our method does not achieve the highest precision for the whole recall range, but based on AP, our method outperforms the other methods.

**Figure 9**
Comparison with the top methods and single scale feature maps-based methods on the Caltech dataset. Our method achieves the lowest miss rate for the whole FPPI range in all the non deep learning methods.

**Figure 10**
Evaluation results under conditions of small scale, atypical aspect ratio and partial occlusion. (a) Small scale (50 px $\leq h \leq$ 80 px); (b) Atypical aspect ratio ( $| w / h - 0.41 | \geq 0.1$ ); (c) Patial occlusion (0–35% occluded).

**Figure 11**
MR versus FPS on the Caltech Dataset.

See this image and copyright information in PMC

References

1. Dollár P., Wojek C., Schiele B., Perona P. Pedestrian detection: A benchmark; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009); Miami, FL, USA. 20–25 June 2009; pp. 304–311.
1. Geiger A., Lenz P., Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Providence, RI, USA. 16–21 June 2012; pp. 3354–3361.
1. Dalal N., Triggs B. Histograms of Oriented Gradients for Human Detection; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005); San Diego, CA, USA. 20–26 June 2005; pp. 886–893.
1. Ess A., Leibe B., Schindler K., Gool L.J.V. A mobile vision system for robust multi-person tracking; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008); Anchorage, AK, USA. 24–26 June 2008.
1. Dollár P., Appel R., Belongie S.J., Perona P. Fast Feature Pyramids for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014;36:1532–1545. doi: 10.1109/TPAMI.2014.2300479. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Affiliations

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources