Multiscale regional calibration network for crowd counting

Jiamao Yu¹, Hexuan Hu²

Affiliations

¹ College of Computer Science and Software Engineering, Hohai University, Nanjing, 211100, China.
² College of Computer Science and Software Engineering, Hohai University, Nanjing, 211100, China. hexuan_hu@hhu.edu.cn.

PMID: 39843571
PMCID: PMC11754780
DOI: 10.1038/s41598-025-86247-w

Multiscale regional calibration network for crowd counting

Jiamao Yu et al. Sci Rep. 2025.

. 2025 Jan 22;15(1):2866.

doi: 10.1038/s41598-025-86247-w.

Authors

Jiamao Yu¹, Hexuan Hu²

Affiliations

¹ College of Computer Science and Software Engineering, Hohai University, Nanjing, 211100, China.
² College of Computer Science and Software Engineering, Hohai University, Nanjing, 211100, China. hexuan_hu@hhu.edu.cn.

PMID: 39843571
PMCID: PMC11754780
DOI: 10.1038/s41598-025-86247-w

Abstract

Crowd counting aims to estimate the number, density, and distribution of crowds in an image. While CNN-based crowd counting methods have been effective, head-scale variation and complex background remain two major challenges for crowd counting. Therefore, we propose a multiscale region calibration network called MRCNet to effectively address these challenges. To address the former challenge, we design a multiscale aware module that utilizes multi-branch dilated convolutional parallelism to obtain multiscale receptive fields and cope with drastic changes in head size. For the latter challenge, we design a regional calibration module that calibrates the attention weights of each region after obtaining the attention map to effectively handle challenges in complex contexts. Additionally, we improve the loss function by combining L2 loss and binary cross-entropy loss to help MRCNet achieve excellent results. Extensive experiments were conducted on three mainstream datasets to demonstrate the robustness and competitiveness of our approach.

Keywords: Crowd counting; Feature aggregation; Multiscale; Regional calibration.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Major challenges facing current crowd counting tasks. (a) The problem of head-size variation. (b) The problem of complex background.

**Fig. 2**
The overall structure of MRCNet. Firstly, the images are fed into the Feature Extraction Module (FEM), which utilizes the first 13 layers of VGG-16 for initial feature extraction. The FEM outputs three levels of features with different network depths, which are then passed through the Multiscale Aware Module (MAM) and Regional Calibration Module (RCM) for feature enhancement. This process generates three levels of features, denoted as , , and . Finally, the three levels of features are combined in the Feature Aggregation Module (FAM) to produce the predicted density map. An improved loss function is used to train the network.

formula image — **Fig. 2**
The overall structure of MRCNet. Firstly, the images are fed into the Feature Extraction Module (FEM), which utilizes the first 13 layers of VGG-16 for initial feature extraction. The FEM outputs three levels of features with different network depths, which are then passed through the Multiscale Aware Module (MAM) and Regional Calibration Module (RCM) for feature enhancement. This process generates three levels of features, denoted as , , and . Finally, the three levels of features are combined in the Feature Aggregation Module (FAM) to produce the predicted density map. An improved loss function is used to train the network.

**Fig. 3**
Visualization results on different datasets, where the three columns of pictures from left to right are the true image, the ground-truth density map, and the predicted density map.

**Fig. 4**
The visualization results of different combinations of MAM and RCM are shown as follows: (a) After adding the MAM, the network can adapt to drastic changes in head-size. (b) When the scene is complex, the network has large errors, such as counting leaves as heads. By adding RCM, the network can enhance its counting accuracy and effectively suppress background interference.

See this image and copyright information in PMC

References

1. Gao, H., Zhao, W., Zhang, D. & Deng, M. Application of improved transformer based on weakly supervised in crowd localization and crowd counting. Sci. Rep.13, 1144 (2023). - DOI - PMC - PubMed
1. Xidias, E., Zacharia, P. & Nearchou, A. Intelligent fleet management of autonomous vehicles for city logistics. Appl. Intell.2022, 1–19 (2022).
1. Xing, J. et al. STGs: construct spatial and temporal graphs for citywide crowd flow prediction. Appl. Intell.52, 12272–12281 (2022). - DOI
1. Ilyas, N., Ahmad, Z., Lee, B. & Kim, K. An effective modular approach for crowd counting in an image using convolutional neural networks. Sci. Rep.12, 5795 (2022). - DOI - PMC - PubMed
1. Zhong, X., Qin, J., Guo, M., Zuo, W. & Lu, W. Offset-decoupled deformable convolution for efficient crowd counting. Sci. Rep.12, 12229 (2022). - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiscale regional calibration network for crowd counting

Affiliations

Multiscale regional calibration network for crowd counting

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources