Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023;20(1):5.
doi: 10.1007/s11554-023-01276-w. Epub 2023 Jan 30.

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

Affiliations

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

Mehmet Şirin Gündüz et al. J Real Time Image Process. 2023.

Abstract

As seen in the COVID-19 pandemic, one of the most important measures is physical distance in viruses transmitted from person to person. According to the World Health Organization (WHO), it is mandatory to have a limited number of people in indoor spaces. Depending on the size of the indoors, the number of persons that can fit in that area varies. Then, the size of the indoor area should be measured and the maximum number of people should be calculated accordingly. Computers can be used to ensure the correct application of the capacity rule in indoors monitored by cameras. In this study, a method is proposed to measure the size of a prespecified region in the video and count the people there in real time. According to this method: (1) predetermining the borders of a region on the video, (2) identification and counting of people in this specified region, (3) it is aimed to estimate the size of the specified area and to find the maximum number of people it can take. For this purpose, the You Only Look Once (YOLO) object detection model was used. In addition, Microsoft COCO dataset pre-trained weights were used to identify and label persons. YOLO models were tested separately in the proposed method and their performances were analyzed. Mean average precision (mAP), frame per second (fps), and accuracy rate metrics were found for the detection of persons in the specified region. While the YOLO v3 model achieved the highest value in accuracy rate and mAP (both 0.50 and 0.75) metrics, the YOLO v5s model achieved the highest fps rate among non-Tiny models.

Keywords: Area estimation; Deep learning; People counting; Person detection; Real-time video processing; YOLO.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThe authors did not receive support from any organization for the submitted work. The authors declare they have no financial interests. The authors have no conflicts of interest to declare that are relevant to the content of this article.

Figures

Fig. 1
Fig. 1
YOLO v4 architecture [15]
Fig. 2
Fig. 2
Main structure of YOLO v4 architecture [28]
Fig. 3
Fig. 3
Structure comparison between a Standard DenseNet and b CSPDenseNet [29]
Fig. 4
Fig. 4
Determination of a region as desired
Fig. 5
Fig. 5
Area calculation algorithm for a specified region in a video
Fig. 6
Fig. 6
The proposed method to estimate the number of people in a specified region
Fig. 7
Fig. 7
Samples taken from the experimental environment: a 1024 × 768 pixels and b 800 × 600 pixels
Fig. 8
Fig. 8
a Number of persons detected in the region < The max capacity. b Number of persons detected in the region > The max capacity. Note that the area at the top (a) is larger than the area at the bottom (b)

References

    1. WHO: Coronavirus disease (COVID-19) advice for the public (2020). Retrieved July 15, 2022, from: https://www.who.int/emergencies/diseases/novelcoronavirus-2019/advice-fo...
    1. Akhtar, N., & Mian, A.: Threat of adversarial attacks on deep learning in computer vision: a survey. In IEEE Access (C. 6, ss. 14410–14430). Institute of Electrical and Electronics Engineers Inc. (2018)
    1. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder–decoder approaches. Proceedings of SSST 2014- 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. (2014)
    1. Young, T., Hazarika, D., Poria, S., & Cambria, E.: Recent trends in deep learning based natural language processing [Review Article]. In IEEE Computational Intelligence Magazine (C. 13, Sayı 3, ss. 55–75). Institute of Electrical and Electronics Engineers Inc. (2018)
    1. Bayat S, Işık G. Recognition of Aras bird species from their voices with deep learning methods. J. Inst. Sci. Technol. 2022;12(3):1250–1263. doi: 10.21597/jist.1124674. - DOI

LinkOut - more resources