Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 18;12(1):12229.
doi: 10.1038/s41598-022-16415-9.

Offset-decoupled deformable convolution for efficient crowd counting

Affiliations

Offset-decoupled deformable convolution for efficient crowd counting

Xin Zhong et al. Sci Rep. .

Abstract

Crowd counting is considered a challenging issue in computer vision. One of the most critical challenges in crowd counting is considering the impact of scale variations. Compared with other methods, better performance is achieved with CNN-based methods. However, given the limit of fixed geometric structures, the head-scale features are not completely obtained. Deformable convolution with additional offsets is widely used in the fields of image classification and pattern recognition, as it can successfully exploit the potential of spatial information. However, owing to the randomly generated parameters of offsets in network initialization, the sampling points of the deformable convolution are disorderly stacked, weakening the effectiveness of feature extraction. To handle the invalid learning of offsets and the inefficient utilization of deformable convolution, an offset-decoupled deformable convolution (ODConv) is proposed in this paper. It can completely obtain information within the effective region of sampling points, leading to better performance. In extensive experiments, average MAE of 62.3, 8.3, 91.9, and 159.3 are achieved using our method on the ShanghaiTech A, ShanghaiTech B, UCF-QNRF, and UCF_CC_50 datasets, respectively, outperforming the state-of-the-art methods and validating the effectiveness of the proposed ODConv.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Visualization of density maps predicted from models trained with DConv. (a) is one of the input images in the ShanghaiTech B dataset, (b) is the ground truth, and the estimated density map is shown in (c) which shows less regular Gaussian blobs.
Figure 2
Figure 2
An illustration of a conventional DConv, in which the offsets are obtained directly from the input feature.
Figure 3
Figure 3
Illustration of our ODConv. The scale map and the pre_offset map are represented by blue and orange parallelograms, respectively. The offsets are obtained from the product of the pre_offset map and the scale map.
Figure 4
Figure 4
Conventional offset-based deformable convolution is presented in (a), and illustrations of the learning process of offsets in offset-decoupled deformable convolution are shown in (b) and (c). The sampling points are represented by the balls. Among them, the colors of typical convolution sampling points (a–i) and the actual sampling points (A–I) are pink and red, respectively. In addition, offsets are indicated by the dark red arrows.
Figure 5
Figure 5
The architecture of the ODConv network. The backbone of CSRNet is replaced with VGG16-BN by inserting the batch normalization layer after each dilated convolution. Then, the last layer of dilated convolution is replaced by offset-decoupled deformable convolution, and the network is defined as our ODConv.
Figure 6
Figure 6
Visualization of an image from the ShanghaiTech B dataset. The first column shows one of the samples and its ground truth devoted as (a) and (b), respectively. The predicted density map in DConv and ODConv is shown in (c) and (e), and the visualization of offsets of DConv and ODConv is presented in (d) and (f).
Figure 7
Figure 7
Training curves of ODConv and DConv on the UCF-QNRF. The training process with Ls and Lp is indicated by the orange solid line, and another gray dotted line presents the training process without Ls and Lp.
Figure 8
Figure 8
The comparisons of DConv and our ODConv with different weights of the scale map and decay rates.
Figure 9
Figure 9
Comparisons of the ODConv and DConv on the ResNet-50 and CSRNet are shown in (a) and (b), respectively, and comparisons of the ODConv on the CSRNet and ResNet-50 are shown in (c). The results of the significance level are indicated by the crimson characters on the top of each figure.

Similar articles

Cited by

References

    1. Q. Wang, J. Gao & W. Lin. NWPU-crowd: A large-scale benchmark for crowd counting and localization. in IEEE Transactions on Pattern Analysis and Machine Intelligence, 3013269 (2020). - PubMed
    1. Mazzeo PL, Contino R, Spagnolo P. MH-MetroNet-a multi-head CNN for passenger-crowd attendance estimation. J. Imaging. 2020;6(7):62–76. doi: 10.3390/jimaging6070062. - DOI - PMC - PubMed
    1. V. A. Sindagi & V. M. Patel. Generating high-quality crowd density maps using contextual pyramid cnns. in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1879–1888 (2017).
    1. Feris RS, Siddiquie B, Petterson J. Large-scale vehicle detection, indexing & search in urban surveillance videos. IEEE Trans. Multimed. 2012;14(1):28–42. doi: 10.1109/TMM.2011.2170666. - DOI
    1. Wang G, Li B, Zhang Y, Yang J. Background modeling and referencing for moving cameras-captured surveillance video coding in hevc. IEEE Trans. Multimed. 2018;20(11):2921–2934. doi: 10.1109/TMM.2018.2829163. - DOI

Publication types