Design and Analysis of a Lightweight Context Fusion CNN Scheme for Crowd Counting

Yang Yu¹, Jifeng Huang², Wen Du³, Naixue Xiong⁴

Affiliations

¹ College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China. 1000441792@smail.shnu.edu.cn.
² College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China. jfhuang@shnu.edu.cn.
³ DS Information Technology Co., Ltd., Shanghai 200032, China. duwen@dscomm.com.cn.
⁴ Department of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK 74464, USA. xiong31@nsuok.edu.

PMID: 31035697
PMCID: PMC6539683
DOI: 10.3390/s19092013

Design and Analysis of a Lightweight Context Fusion CNN Scheme for Crowd Counting

Yang Yu et al. Sensors (Basel). 2019.

. 2019 Apr 29;19(9):2013.

doi: 10.3390/s19092013.

Authors

Yang Yu¹, Jifeng Huang², Wen Du³, Naixue Xiong⁴

Affiliations

¹ College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China. 1000441792@smail.shnu.edu.cn.
² College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China. jfhuang@shnu.edu.cn.
³ DS Information Technology Co., Ltd., Shanghai 200032, China. duwen@dscomm.com.cn.
⁴ Department of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK 74464, USA. xiong31@nsuok.edu.

PMID: 31035697
PMCID: PMC6539683
DOI: 10.3390/s19092013

Abstract

Crowd counting, which is widely used in disaster management, traffic monitoring, and other fields of urban security, is a challenging task that is attracting increasing interest from researchers. For better accuracy, most methods have attempted to handle the scale variation explicitly. which results in huge scale changes of the object size. However, earlier methods based on convolutional neural networks (CNN) have focused primarily on improving accuracy while ignoring the complexity of the model. This paper proposes a novel method based on a lightweight CNN-based network for estimating crowd counting and generating density maps under resource constraints. The network is composed of three components: a basic feature extractor (BFE), a stacked à trous convolution module (SACM), and a context fusion module (CFM). The BFE encodes basic feature information with reduced spatial resolution for further refining. Various pieces of contextual information are generated through a short pipeline in SACM. To generate a context fusion density map, CFM distills feature maps from the above components. The whole network is trained in an end-to-end fashion and uses a compression factor to restrict its size. Experiments on three highly-challenging datasets demonstrate that the proposed method delivers attractive performance.

Keywords: computer vision; convolutional neural networks; crowd counting; deep learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The architecture of the basic unit in our backbone. (a) is the layer organization in which a unit is stacked repeatedly and the shortcut connection aids information transfer, and (b) is the scheme of the unit, which is composed of a $1 \times 1$ bottleneck layer and a $3 \times 3$ ordinary convolutional layer. The compression factor ( $α$ ) and the filters of the Unit ( $f i l t e r s$ ) jointly determine the number of filters in each layer.

**Figure 2**
Overview of the proposed network for density estimation. The network extracts common feature maps with reduced spatial resolution using basic feature extractor (BFE). To ensure a diversity of contextual features, the stacked à trous convolution module (SACM) enlarges the gap between the receptive sizes of the feature maps. The context fusion module (CFM) distills different contextual information and fuses it to estimate the density map.

**Figure 3**
Analysis of the context fusion module (CFM). The outputs of CFM are visualized, and the red bounding boxes mark the same spatial region in different maps. In the feature maps, the color from red to blue reflects the response from strong to weak. (a) Input image, (b) estimation result, (c) first output of CFM, (d) second output of CFM, and (e) third output of CFM.

**Figure 4**
Density estimation results on the ShanghaiTech Part A [8]. (a) Input images, (b) ground truth density maps, and (c) estimation results.

**Figure 5**
Density estimation results on ShanghaiTech Part B [8]. (a) Input images, (b) ground truth density maps, and (c) estimation results.

**Figure 6**
Density estimation results on the UCF_CC_50 [3]. (a) Input images, (b) ground truth density maps, and (c) estimation results.

**Figure 7**
Density estimation results on the WorldExpo’10 dataset [14]. (a) Input images, (b) ground truth density maps, and (c) estimation results.

See this image and copyright information in PMC

References

1. Huang S., Li X., Zhang Z., Wu F., Gao S., Ji R., Han J. Body Structure Aware Deep Crowd Counting. IEEE Trans. Image Process. 2018;27:1049–1059. doi: 10.1109/TIP.2017.2740160. - DOI - PubMed
1. Hu Y., Chang H., Nian F., Wang Y., Li T. Dense Crowd Counting from Still Images with Convolutional Neural Networks. J. Vis. Commun. Image Represent. 2016;38:530–539. doi: 10.1016/j.jvcir.2016.03.021. - DOI
1. Idrees H., Saleemi I., Seibert C., Shah M. Multi-source Multi-scale Counting in Extremely Dense Crowd Images; Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; Portland, OR, USA. 23–28 June 2013; pp. 2547–2554. - DOI
1. Sindagi V.A., Patel V.M. CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting; Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance; Lecce, Italy. 29 August–1 September 2017; pp. 1–6. - DOI
1. Herath S., Harandi M., Porikli F. Going deeper into action recognition: A survey. Image Vis. Comput. 2017;60:4–21. doi: 10.1016/j.imavis.2017.01.010. - DOI

Grants and funding

2018-RGZN-01013/the Artificial Intelligence Development and Innovation Project of Shanghai

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Design and Analysis of a Lightweight Context Fusion CNN Scheme for Crowd Counting

Affiliations

Design and Analysis of a Lightweight Context Fusion CNN Scheme for Crowd Counting

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources