. 2023;82(6):9243-9275.

doi: 10.1007/s11042-022-13644-y. Epub 2022 Aug 8.

Object detection using YOLO: challenges, architectural successors, datasets and applications

Tausif Diwan¹, G Anirudh², Jitendra V Tembhurne¹

Affiliations

¹ Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India.
² Department of Data science and analytics, Central University of Rajasthan, Jaipur, Rajasthan India.

PMID: 35968414
PMCID: PMC9358372
DOI: 10.1007/s11042-022-13644-y

Object detection using YOLO: challenges, architectural successors, datasets and applications

Tausif Diwan et al. Multimed Tools Appl. 2023.

. 2023;82(6):9243-9275.

doi: 10.1007/s11042-022-13644-y. Epub 2022 Aug 8.

Authors

Tausif Diwan¹, G Anirudh², Jitendra V Tembhurne¹

Affiliations

¹ Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India.
² Department of Data science and analytics, Central University of Rajasthan, Jaipur, Rajasthan India.

PMID: 35968414
PMCID: PMC9358372
DOI: 10.1007/s11042-022-13644-y

Abstract

Object detection is one of the predominant and challenging problems in computer vision. Over the decade, with the expeditious evolution of deep learning, researchers have extensively experimented and contributed in the performance enhancement of object detection and related tasks such as object classification, localization, and segmentation using underlying deep models. Broadly, object detectors are classified into two categories viz. two stage and single stage object detectors. Two stage detectors mainly focus on selective region proposals strategy via complex architecture; however, single stage detectors focus on all the spatial region proposals for the possible detection of objects via relatively simpler architecture in one shot. Performance of any object detector is evaluated through detection accuracy and inference time. Generally, the detection accuracy of two stage detectors outperforms single stage object detectors. However, the inference time of single stage detectors is better compared to its counterparts. Moreover, with the advent of YOLO (You Only Look Once) and its architectural successors, the detection accuracy is improving significantly and sometime it is better than two stage detectors. YOLOs are adopted in various applications majorly due to their faster inferences rather than considering detection accuracy. As an example, detection accuracies are 63.4 and 70 for YOLO and Fast-RCNN respectively, however, inference time is around 300 times faster in case of YOLO. In this paper, we present a comprehensive review of single stage object detectors specially YOLOs, regression formulation, their architecture advancements, and performance statistics. Moreover, we summarize the comparative illustration between two stage and single stage object detectors, among different versions of YOLOs, applications based on two stage detectors, and different versions of YOLOs along with the future research directions.

Keywords: Computer vision; Convolutional neural networks; Deep learning; Object detection; YOLO.

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

PubMed Disclaimer

Conflict of interest statement

Competing interestsWe do not have any conflict of interest related to the manuscript.

Figures

**Fig. 1**
Classification, Localization, and Segmentation in Single and Multiple Objects image [13]

**Fig. 2**
Generic architecture of single stage object detectors [29]

**Fig. 3**
Year wise evolution of object detection algorithms

**Fig. 5**
Two stage object detectors (a) RCNN (b) Fast-RCNN (c) Faster-RCNN [44]

**Fig. 6**
Generic architecture of Convolutional neural networks [2]

**Fig. 7**
Convolutional Neural Network Layered Operations (http://cs231n.github.io/convolutional-networks/). a Conv-layer b Max-pooling layer c ReLU activation

**Fig. 9**
Network in Network architecture [39]

**Fig. 12**
Inception module of the GoogLeNet architecture [65]

**Fig. 13**
ResNet architecture demonstrating the skip connections [11]

**Fig. 14**
Dividing the image into grid cells and predictions corresponding to one grid cell

**Fig. 15**
Multiple bounding boxes and their overlapping with the ground truth (a) Multiple bounding boxes (b) high overlapping (c) low overlapping

**Fig. 16**
The effect of Non-Max Suppression in Object detection using YOLO

**Fig. 17**
Computational schematic of Intersection over Union (IoU)

**Fig. 18**
YOLO architecture for object detection and localization [56]

**Fig. 19**
Layer wise architectural operations in Darknet-19 framework [54]

**Fig. 20**
Skip connections in the ResNet Module [25]

**Fig. 21**
Architecture of YOLO (v3) [81]

**Fig. 22**
CSPNet vs DenseNet architecture utilized in YOLO-v4 [73]. a CSPNet b DenseNet

See this image and copyright information in PMC

References

1. Agarwal S, Terrail JO, Jurie F (2018) Recent advances in object detection in the age of deep convolutional neural networks. arXiv preprint arXiv:1809.03193. 10.48550/arXiv.1809.03193
1. Albelwi S, Mahmood A. A framework for designing the architectures of deep convolutional neural networks. Entropy. 2017;19(6):242. doi: 10.3390/e19060242. - DOI
1. Bengio Y, Courville AC, Vincent P (2012) Unsupervised feature learning and deep learning: a review and new perspectives. CoRR, abs/1206.5538, 1(2665)
1. Bhattacharya S, Maddikunta PKR, Pham QV, Gadekallu TR, Chowdhary CL, Alazab M, Piran MJ (2021) Deep learning and medical image processing for coronavirus (COVID-19) pandemic: a survey. Sustain Cities Soc 65:102589. 10.1016/j.scs.2020.102589 - PMC - PubMed
1. Bochkovskiy A, Wang CY, Liao HY (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Object detection using YOLO: challenges, architectural successors, datasets and applications

Affiliations

Object detection using YOLO: challenges, architectural successors, datasets and applications

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources