Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 29;19(9):2013.
doi: 10.3390/s19092013.

Design and Analysis of a Lightweight Context Fusion CNN Scheme for Crowd Counting

Affiliations

Design and Analysis of a Lightweight Context Fusion CNN Scheme for Crowd Counting

Yang Yu et al. Sensors (Basel). .

Abstract

Crowd counting, which is widely used in disaster management, traffic monitoring, and other fields of urban security, is a challenging task that is attracting increasing interest from researchers. For better accuracy, most methods have attempted to handle the scale variation explicitly. which results in huge scale changes of the object size. However, earlier methods based on convolutional neural networks (CNN) have focused primarily on improving accuracy while ignoring the complexity of the model. This paper proposes a novel method based on a lightweight CNN-based network for estimating crowd counting and generating density maps under resource constraints. The network is composed of three components: a basic feature extractor (BFE), a stacked à trous convolution module (SACM), and a context fusion module (CFM). The BFE encodes basic feature information with reduced spatial resolution for further refining. Various pieces of contextual information are generated through a short pipeline in SACM. To generate a context fusion density map, CFM distills feature maps from the above components. The whole network is trained in an end-to-end fashion and uses a compression factor to restrict its size. Experiments on three highly-challenging datasets demonstrate that the proposed method delivers attractive performance.

Keywords: computer vision; convolutional neural networks; crowd counting; deep learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The architecture of the basic unit in our backbone. (a) is the layer organization in which a unit is stacked repeatedly and the shortcut connection aids information transfer, and (b) is the scheme of the unit, which is composed of a 1×1 bottleneck layer and a 3×3 ordinary convolutional layer. The compression factor (α) and the filters of the Unit (filters) jointly determine the number of filters in each layer.
Figure 2
Figure 2
Overview of the proposed network for density estimation. The network extracts common feature maps with reduced spatial resolution using basic feature extractor (BFE). To ensure a diversity of contextual features, the stacked à trous convolution module (SACM) enlarges the gap between the receptive sizes of the feature maps. The context fusion module (CFM) distills different contextual information and fuses it to estimate the density map.
Figure 3
Figure 3
Analysis of the context fusion module (CFM). The outputs of CFM are visualized, and the red bounding boxes mark the same spatial region in different maps. In the feature maps, the color from red to blue reflects the response from strong to weak. (a) Input image, (b) estimation result, (c) first output of CFM, (d) second output of CFM, and (e) third output of CFM.
Figure 4
Figure 4
Density estimation results on the ShanghaiTech Part A [8]. (a) Input images, (b) ground truth density maps, and (c) estimation results.
Figure 5
Figure 5
Density estimation results on ShanghaiTech Part B [8]. (a) Input images, (b) ground truth density maps, and (c) estimation results.
Figure 6
Figure 6
Density estimation results on the UCF_CC_50 [3]. (a) Input images, (b) ground truth density maps, and (c) estimation results.
Figure 7
Figure 7
Density estimation results on the WorldExpo’10 dataset [14]. (a) Input images, (b) ground truth density maps, and (c) estimation results.

References

    1. Huang S., Li X., Zhang Z., Wu F., Gao S., Ji R., Han J. Body Structure Aware Deep Crowd Counting. IEEE Trans. Image Process. 2018;27:1049–1059. doi: 10.1109/TIP.2017.2740160. - DOI - PubMed
    1. Hu Y., Chang H., Nian F., Wang Y., Li T. Dense Crowd Counting from Still Images with Convolutional Neural Networks. J. Vis. Commun. Image Represent. 2016;38:530–539. doi: 10.1016/j.jvcir.2016.03.021. - DOI
    1. Idrees H., Saleemi I., Seibert C., Shah M. Multi-source Multi-scale Counting in Extremely Dense Crowd Images; Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; Portland, OR, USA. 23–28 June 2013; pp. 2547–2554. - DOI
    1. Sindagi V.A., Patel V.M. CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting; Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance; Lecce, Italy. 29 August–1 September 2017; pp. 1–6. - DOI
    1. Herath S., Harandi M., Porikli F. Going deeper into action recognition: A survey. Image Vis. Comput. 2017;60:4–21. doi: 10.1016/j.imavis.2017.01.010. - DOI

LinkOut - more resources