Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 25;24(19):6209.
doi: 10.3390/s24196209.

SOD-YOLOv8-Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes

Affiliations

SOD-YOLOv8-Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes

Boshra Khalili et al. Sensors (Basel). .

Abstract

Object detection, as a crucial aspect of computer vision, plays a vital role in traffic management, emergency response, autonomous vehicles, and smart cities. Despite the significant advancements in object detection, detecting small objects in images captured by high-altitude cameras remains challenging, due to factors such as object size, distance from the camera, varied shapes, and cluttered backgrounds. To address these challenges, we propose small object detection YOLOv8 (SOD-YOLOv8), a novel model specifically designed for scenarios involving numerous small objects. Inspired by efficient generalized feature pyramid networks (GFPNs), we enhance multi-path fusion within YOLOv8 to integrate features across different levels, preserving details from shallower layers and improving small object detection accuracy. Additionally, we introduce a fourth detection layer to effectively utilize high-resolution spatial information. The efficient multi-scale attention module (EMA) in the C2f-EMA module further enhances feature extraction by redistributing weights and prioritizing relevant features. We introduce powerful-IoU (PIoU) as a replacement for CIoU, focusing on moderate quality anchor boxes and adding a penalty based on differences between predicted and ground truth bounding box corners. This approach simplifies calculations, speeds up convergence, and enhances detection accuracy. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models across various metrics, without substantially increasing the computational cost or latency compared to YOLOv8s. Specifically, it increased recall from 40.1% to 43.9%, precision from 51.2% to 53.9%, mAP0.5 from 40.6% to 45.1%, and mAP0.5:0.95 from 24% to 26.6%. Furthermore, experiments conducted in dynamic real-world traffic scenes illustrated SOD-YOLOv8's significant enhancements across diverse environmental conditions, highlighting its reliability and effective object detection capabilities in challenging scenarios.

Keywords: YOLOv8; attention mechanism; bounding box regression; feature pyramid network; small object detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The network structure of YOLOv8, including the following modules: (a) C2F; (b) Bottleneck; (c) Convolution (conv); (d) Spatial Pyramid Pooling Fast (SPPF); and (e) Detection Layer.
Figure 2
Figure 2
Proposed improved YOLOv8 for small object detection, with original YOLOv8 in gray and highlighted improved modules.
Figure 3
Figure 3
skip-layer links: (a) dense-link: concatenates features from all preceding layers; (b) log2n-link: concatenates features from up to log2(l)+1 layers at each level.
Figure 4
Figure 4
Different feature pyramid network designs: (a) FPN uses a top-down strategy; (b) PANet enhances FPN with a bottom-up pathway; (c) BiFPN integrates cross-scale pathways bidirectionally; (d) GFPN includes a queen-fusion style pathway and skip-layer connections.
Figure 5
Figure 5
Enhanced and efficient GPFN structure.
Figure 6
Figure 6
Efficient multi-scale attention mechanism.
Figure 7
Figure 7
C2f-EMA.
Figure 8
Figure 8
Anchor box regression process guided by (a) complete IoU-based loss function (CIoU), (b) penalty term in powerful-IoU (PIoU) loss function without attention function.
Figure 9
Figure 9
Information regarding the manual annotation process for objects in the VisDrone2019 dataset.
Figure 10
Figure 10
(a) Training progress plot comparing YOLOv8s-GFPN-EMA, YOLOv8s-GFPN, and YOLOv8s based on mAP0.5 (b) and precision.
Figure 11
Figure 11
(a) Confusion matrix of YOLOv8s; (b) confusion matrix of proposed model.
Figure 12
Figure 12
Inference results for (a) YOLOv8s and (b) SOD-YOLOv8s across diverse scenarios including distant and high-density objects, as well as nighttime scenarios, using the VisDrone2019 dataset.
Figure 13
Figure 13
The perspective captured by COSMOS cameras on the 12th floor of Columbia’s Mudd building overlooking the intersection [56].
Figure 14
Figure 14
Inference results for (a) YOLOv8s and (b) SOD-YOLOv8s across various scenarios, including scenes with distant and high-density objects, as well as nighttime scenarios, using the traffic scene dataset.

Similar articles

Cited by

References

    1. Chen X., Ma H., Wan J., Li B., Xia T. Multi-view 3D oBject Detection Network for Autonomous Driving; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Honolulu, HI, USA. 21–26 July 2017; pp. 6526–6534.
    1. Alqarqaz M., Younes M.B., Qaddoura R. An Object Classification Approach for Autonomous Vehicles Using Machine Learning Techniques. World Electr. Veh. J. 2023;14:41. doi: 10.3390/wevj14020041. - DOI
    1. Lim Y., Tiang S.S., Lim W.H., Wong C.H., Mastaneh M., Chong K.S., Sun B. Object Detection in Autonomous Vehicles: A Performance Analysis; Proceedings of the International Conference on Mechatronics and Intelligent Robotics; Singapore. 22–23 August 2023; Singapore: Springer Nature; 2023. pp. 277–291.
    1. Feng J., Wang J., Qin R. Lightweight detection network for arbitrary-oriented vehicles in UAV imagery via precise positional information encoding and bidirectional feature fusion. Int. J. Remote Sens. 2023;44:4529–4558. doi: 10.1080/01431161.2023.2197129. - DOI
    1. Chuai Q., He X., Li Y. Improved Traffic Small Object Detection via Cross-Layer Feature Fusion and Channel Attention. Electronics. 2023;12:3421. doi: 10.3390/electronics12163421. - DOI

LinkOut - more resources