Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 25:13:102820.
doi: 10.1016/j.mex.2024.102820. eCollection 2024 Dec.

Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones

Affiliations

Improved STNNet, A benchmark for detection, tracking, and counting crowds using Drones

Mohd Nazeer et al. MethodsX. .

Erratum in

Abstract

In computer vision, navigating multi-object tracking in crowded scenes poses a fundamental challenge with broad applications ranging from surveillance systems to autonomous vehicles. Traditional tracking methods encounter difficulties associating noisy object detections and maintaining consistent labels across frames, particularly in scenarios like video surveillance for crowd control and public safety. This paper introduces 'Improved Space-Time Neighbor-Aware Network (STNNet),' an advanced framework for online Multi-Object Tracking (MOT) designed to address these challenges. Expanding upon the foundational STNNet architecture, our enhanced model incorporates deep reinforcement learning techniques to refine decision-making. By framing the online MOT problem as a Markov Decision Process (MDP), Improved STNNet learns a sophisticated policy for data association, adeptly handling complexities such as object birth/death and appearance/disappearance as state transitions within the MDP. Through extensive experimentation on benchmark datasets, including the MOT Challenge, our proposed Improved STNNet demonstrates superior performance, surpassing existing methods in demanding, crowded scenarios. This study showcases the effectiveness of our approach and lays the groundwork for advancing real-time video analysis applications, particularly in dynamic, crowded environments. Additionally, we utilize the dataset provided by STNNET for density map estimation, forming the basis for our research.•Develop an advanced framework for online Multi-Object Tracking (MOT) to address crowded scene challenges, particularly improving object association and label consistency across frames.•Explore integrating Deep Reinforcement learning techniques into the MOT framework, framing the problem as an MDP to refine decision-making and handle complexities such as object birth or death and appearance or disappearance transitions.

Keywords: Crowd counting; Density estimation; Improved STNNet; Neural network; Surveillance; Tracking and localization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Image, graphical abstract
Graphical abstract
Fig. 1:
Fig. 1
(a) the localization subnet. (X^i,y^i) represent the estimated coordinates of the i th target. I represent the input image.
Fig. 2:
Fig. 2
(a) the association subnet using [9](b) the neighboring context loss. Notably, the dashed modules in (a) are only used in the training phase. For clarity, we only display the calculation of the terms from time t − 1 to time t in the neighboring context loss [9]. (X^i,y^i) represent the estimated coordinates of the i th target. I represent the input image.
Fig. 3:
Fig. 3
Methodology.

References

    1. Li S., Hu Z., Zhao M., Sun Z. Cascade-guided multi-scale attention network for crowd counting. Signal Image Video Process. 2021;15:1663–1670.
    1. Zhang Y., Zhou D., Chen S., Gao S., Ma Y. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Single-image crowd counting via multi-column convolutional neural network; pp. 589–597.
    1. Ma Z., Wei X., Hong X., Gong Y. Proceedings of the IEEE/CVF international conference on computer vision. 2019. Bayesian loss for crowd count estimation with point supervision; pp. 6142–6151.
    1. Wang Q., Gao J., Lin W., Li X. IEEE transactions on pattern analysis and machine intelligence. Vol. 43. 2020. Nwpu-crowd: a large-scale benchmark for crowd counting and localization; pp. 2141–2149. - PubMed
    1. Xiong F., Shi X., Yeung D. Spatiotemporal modeling for crowd counting in videos. in ICCV. 2017:5161–5169.

LinkOut - more resources