Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 26;24(1):134.
doi: 10.3390/s24010134.

Small Target-YOLOv5: Enhancing the Algorithm for Small Object Detection in Drone Aerial Imagery Based on YOLOv5

Affiliations

Small Target-YOLOv5: Enhancing the Algorithm for Small Object Detection in Drone Aerial Imagery Based on YOLOv5

Jiachen Zhou et al. Sensors (Basel). .

Abstract

Object detection in drone aerial imagery has been a consistent focal point of research. Aerial images present more intricate backgrounds, greater variation in object scale, and a higher occurrence of small objects compared to standard images. Consequently, conventional object detection algorithms are often unsuitable for direct application in drone scenarios. To address these challenges, this study proposes a drone object detection algorithm model based on YOLOv5, named SMT-YOLOv5 (Small Target-YOLOv5). The enhancement strategy involves improving the feature fusion network by incorporating detection layers and implementing a weighted bidirectional feature pyramid network. Additionally, the introduction of the Combine Attention and Receptive Fields Block (CARFB) receptive field feature extraction module and DyHead dynamic target detection head aims to broaden the receptive field, mitigate information loss, and enhance perceptual capabilities in spatial, scale, and task domains. Experimental validation on the VisDrone2021 dataset confirms a significant improvement in the target detection accuracy of SMT-YOLOv5. Each improvement strategy yields effective results, raising the average precision by 12.4 percentage points compared to the original method. Detection improvements for large, medium, and small targets increase by 6.9%, 9.5%, and 7.7%, respectively, compared to the original method. Similarly, applying the same improvement strategies to the low-complexity YOLOv8n results in SMT-YOLOv8n, which is comparable in complexity to SMT-YOLOv5s. The results indicate that, relative to SMT-YOLOv8n, SMT-YOLOv5s achieves a 2.5 percentage point increase in average precision. Furthermore, comparative experiments with other enhancement methods demonstrate the effectiveness of the improvement strategies.

Keywords: drone aerial imagery; dynamic object detection head; feature fusion network; receptive field feature extraction module; small objects.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
SMT-YOLOV5.
Figure 2
Figure 2
PANet + FPN.
Figure 3
Figure 3
Improved network structure.
Figure 4
Figure 4
Architecture of CARFB.
Figure 5
Figure 5
Architecture of DyHead.
Figure 6
Figure 6
The visualized results of the attributes of the Visdrone dataset used in this paper. (a) The categories of dataset. (b) The ratio of the height to the width of the bounding box to the original image.
Figure 7
Figure 7
PR-curve for YOLOv5s.
Figure 8
Figure 8
PR-curve for SMT-YOLOv5s.
Figure 9
Figure 9
YOLOv5s vs. SMT-YOLOv5: dense distribution detection. (a) result of YOLOv5; (b) result of SMT-YOLOv5.
Figure 10
Figure 10
YOLOv5s vs. SMT-YOLOv5: complex background detection. (a) result of YOLOv5; (b) result of SMT-YOLOv5.
Figure 11
Figure 11
YOLOv5s vs. SMT-YOLOv5: low illumination detection. (a) result of YOLOv5; (b) result of SMT-YOLOv5.
Figure 12
Figure 12
YOLOv5s vs. SMT-YOLOv5: minuscule target detection. (a) result of YOLOv5; (b) result of SMT-YOLOv5.
Figure 13
Figure 13
YOLOv5s vs. SMT-YOLOv5: heatmap comparison for tiny objects. (a) Original image, (b) YOLOv5s result, (c) SMT-YOLOv5s result.
Figure 13
Figure 13
YOLOv5s vs. SMT-YOLOv5: heatmap comparison for tiny objects. (a) Original image, (b) YOLOv5s result, (c) SMT-YOLOv5s result.

Similar articles

Cited by

References

    1. Pietikäinen M. Local binary patterns. Scholarpedia. 2010;5:9775. doi: 10.4249/scholarpedia.9775. - DOI
    1. Lindeberg T. Scale invariant feature transform. Scholarpedia. 2012;7:10491. doi: 10.4249/scholarpedia.10491. - DOI
    1. Munteanu C., Rosa A. Gray-scale image enhancement as an automatic process driven by evolution. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2004;34:1292–1298. doi: 10.1109/TSMCB.2003.818533. - DOI - PubMed
    1. Schapire R.E. The strength of weak learnability. Mach. Learn. 1990;5:197–227. doi: 10.1007/BF00116037. - DOI
    1. Breiman L. Bagging predictors. Mach. Learn. 1996;24:123–140. doi: 10.1007/BF00058655. - DOI

LinkOut - more resources