Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 9;15(1):16214.
doi: 10.1038/s41598-025-00239-4.

An object detection model AAPW-YOLO for UAV remote sensing images based on adaptive convolution and reconstructed feature fusion

Affiliations

An object detection model AAPW-YOLO for UAV remote sensing images based on adaptive convolution and reconstructed feature fusion

Yiming Wu et al. Sci Rep. .

Abstract

In small object detection scenarios such as UAV aerial imagery and remote sensing, the difficulties in feature extraction are primarily due to challenges such as small object size, multi-scale variations, and background interference. To overcome these challenges, this paper presents a model for detecting small objects, AAPW-YOLO, based on adaptive convolution and reconstructed feature fusion. In the AAPW-YOLO model, we improve the standard convolution and the CSP Bottleneck with 2 Convolutions (C2f) structure in the You Only Look Once v8 (YOLOv8) backbone network by using Alterable Kernel Convolution (AKConv), which improves the network's proficiency in capturing features across various scales while considerably lowering the model's parameter count. Additionally, we introduce the Attentional Scale Sequence Fusion P2 (ASFP2) structure, which enhances the feature fusion mechanism of the Attentional Scale Sequence Fusion You Only Look Once (ASF-YOLO) and incorporates a P2 detection layer. This optimizes the feature fusion mechanism in the YOLOv8 neck, enhancing the network's ability to capture both fine details and global contextual information, while additionally decreasing the model parameters. Finally, we adopt a gradient-enhancing strategy with the Wise Intersection over Union (Wise-IoU) loss function to balance the gradient contributions from anchor boxes of different qualities during training, thereby improving regression accuracy. Experimental results show that: The proposed detection model reduces the parameter count by 30% and improves mAP@0.5 by 3.6% on the VisDrone2019 dataset; On the DOTA v1.0 dataset, the parameter count is reduced by 30%, with a 2.5% improvement in mAP@0.5. The proposed model achieves high recognition accuracy while having fewer parameters, enhancing the robustness and generalization ability of the network.

Keywords: AKConv; Feature fusion mechanism; Small object detection; Wise-IoU; YOLOv8.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
AAPW-YOLO model architecture diagram. The three improvement methods are: C2f-AKConv, ASFP2, and Wise-IoU. All images used are from the public datasets VisDrone2019 and DOTA v1.0.
Fig. 2
Fig. 2
Four sampling shapes of AKConv with a convolution kernel size of 5.
Fig. 3
Fig. 3
AKConv process diagram.
Fig. 4
Fig. 4
Structure diagram of the bottleneck module before and after improvement.
Fig. 5
Fig. 5
ASFP2 structure diagram.
Fig. 6
Fig. 6
SSFF module structure diagram.
Fig. 7
Fig. 7
TFE module structure diagram.
Fig. 8
Fig. 8
The schematic diagram of Wise-IoU.
Fig. 9
Fig. 9
Validation results P-R curve.
Fig. 10
Fig. 10
P-R curve of validation results.
Fig. 11
Fig. 11
P-R curve results for validation of different algorithms.
Fig. 12
Fig. 12
Visualization of detection results on the VisDrone2019 test set. In AAPW-YOLO, the red rectangular box with an arrow indicates that our improved algorithm can detect more small objects in different scenarios compared to the YOLOv8n baseline model, demonstrating the improvement in small object detection accuracy.
Fig. 13
Fig. 13
Grad-CAM++ heatmap results. All images used are from the public dataset VisDrone2019.
Fig. 14
Fig. 14
Visualization of detection results on the DOTAv1.0 test set. In AAPW-YOLO, the red rectangular box with an arrow indicates that our improved algorithm can detect more small objects in different scenarios compared to the YOLOv8n baseline model, demonstrating the improvement in small object detection accuracy.
Fig. 15
Fig. 15
Grad-CAM++ heatmap results. All images used are from the public dataset DOTA v1.0.

Similar articles

References

    1. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520 (2018).
    1. Ma, N., Zhang, X., Zheng, H.-T. & Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV). 116–131 (2018).
    1. Han, K. et al. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1580–1589 (2020).
    1. Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. 6105–6114 (PMLR, 2019).
    1. Liu, C. et al. Yolc: You only look clusters for tiny object detection in aerial images. In IEEE Transactions on Intelligent Transportation Systems (2024).

LinkOut - more resources