Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones

Mohd Nazeer¹, Kanhaiya Sharma², S Sathappan¹, Pulipati Srilatha³, Arshad Ahmad Khan Mohammed⁴

Affiliations

¹ Vidya Jyothi Institute of Technology, Hyderabad, 500075, India.
² Symbiosis Institute of Technology Pune, Symbiosis International (Deemed) University, Pune, 411021, India.
³ Department of Artificial Intelligence & Data Science CBIT, gandipet, Hyderabad, India.
⁴ GITAM Deemed to be University, Hyderabad, India.

PMID: 39071994
PMCID: PMC11278589
DOI: 10.1016/j.mex.2024.102820

Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones

Mohd Nazeer et al. MethodsX. 2024.

. 2024 Jun 25:13:102820.

doi: 10.1016/j.mex.2024.102820. eCollection 2024 Dec.

Authors

Mohd Nazeer¹, Kanhaiya Sharma², S Sathappan¹, Pulipati Srilatha³, Arshad Ahmad Khan Mohammed⁴

Affiliations

¹ Vidya Jyothi Institute of Technology, Hyderabad, 500075, India.
² Symbiosis Institute of Technology Pune, Symbiosis International (Deemed) University, Pune, 411021, India.
³ Department of Artificial Intelligence & Data Science CBIT, gandipet, Hyderabad, India.
⁴ GITAM Deemed to be University, Hyderabad, India.

PMID: 39071994
PMCID: PMC11278589
DOI: 10.1016/j.mex.2024.102820

Erratum in

Corrigendum to "Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones" [MethodsX 13 (2024) 1-7/102820].
Nazeer M, Sharma K, Sathappan S, Srilatha P, Mohammed AAK. Nazeer M, et al. MethodsX. 2024 Jul 25;13:102870. doi: 10.1016/j.mex.2024.102870. eCollection 2024 Dec. MethodsX. 2024. PMID: 39171193 Free PMC article.

Abstract

In computer vision, navigating multi-object tracking in crowded scenes poses a fundamental challenge with broad applications ranging from surveillance systems to autonomous vehicles. Traditional tracking methods encounter difficulties associating noisy object detections and maintaining consistent labels across frames, particularly in scenarios like video surveillance for crowd control and public safety. This paper introduces 'Improved Space-Time Neighbor-Aware Network (STNNet),' an advanced framework for online Multi-Object Tracking (MOT) designed to address these challenges. Expanding upon the foundational STNNet architecture, our enhanced model incorporates deep reinforcement learning techniques to refine decision-making. By framing the online MOT problem as a Markov Decision Process (MDP), Improved STNNet learns a sophisticated policy for data association, adeptly handling complexities such as object birth/death and appearance/disappearance as state transitions within the MDP. Through extensive experimentation on benchmark datasets, including the MOT Challenge, our proposed Improved STNNet demonstrates superior performance, surpassing existing methods in demanding, crowded scenarios. This study showcases the effectiveness of our approach and lays the groundwork for advancing real-time video analysis applications, particularly in dynamic, crowded environments. Additionally, we utilize the dataset provided by STNNET for density map estimation, forming the basis for our research.•Develop an advanced framework for online Multi-Object Tracking (MOT) to address crowded scene challenges, particularly improving object association and label consistency across frames.•Explore integrating Deep Reinforcement learning techniques into the MOT framework, framing the problem as an MDP to refine decision-making and handle complexities such as object birth or death and appearance or disappearance transitions.

Keywords: Crowd counting; Density estimation; Improved STNNet; Neural network; Surveillance; Tracking and localization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Image, graphical abstract — **Graphical abstract**

Fig. 1: — **Fig. 1**
(a) the localization subnet. $(\hat{X} i, \hat{y} i)$ represent the estimated coordinates of the i th target. I represent the input image.

Fig. 2: — **Fig. 2**
(a) the association subnet using [9](b) the neighboring context loss. Notably, the dashed modules in (a) are only used in the training phase. For clarity, we only display the calculation of the terms from time t − 1 to time t in the neighboring context loss [9]. $(\hat{X} i, \hat{y} i)$ represent the estimated coordinates of the i th target. I represent the input image.

See this image and copyright information in PMC

References

1. Li S., Hu Z., Zhao M., Sun Z. Cascade-guided multi-scale attention network for crowd counting. Signal Image Video Process. 2021;15:1663–1670.
1. Zhang Y., Zhou D., Chen S., Gao S., Ma Y. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Single-image crowd counting via multi-column convolutional neural network; pp. 589–597.
1. Ma Z., Wei X., Hong X., Gong Y. Proceedings of the IEEE/CVF international conference on computer vision. 2019. Bayesian loss for crowd count estimation with point supervision; pp. 6142–6151.
1. Wang Q., Gao J., Lin W., Li X. IEEE transactions on pattern analysis and machine intelligence. Vol. 43. 2020. Nwpu-crowd: a large-scale benchmark for crowd counting and localization; pp. 2141–2149. - PubMed
1. Xiong F., Shi X., Yeung D. Spatiotemporal modeling for crowd counting in videos. in ICCV. 2017:5161–5169.

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones

Affiliations

Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources