Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 3;19(6):e0298698.
doi: 10.1371/journal.pone.0298698. eCollection 2024.

ASG-YOLOv5: Improved YOLOv5 unmanned aerial vehicle remote sensing aerial images scenario for small object detection based on attention and spatial gating

Affiliations

ASG-YOLOv5: Improved YOLOv5 unmanned aerial vehicle remote sensing aerial images scenario for small object detection based on attention and spatial gating

Houwang Shi et al. PLoS One. .

Abstract

With the accelerated development of the technological power of society, aerial images of drones gradually penetrated various industries. Due to the variable speed of drones, the captured images are shadowed, blurred, and obscured. Second, drones fly at varying altitudes, leading to changing target scales and making it difficult to detect and identify small targets. In order to solve the above problems, an improved ASG-YOLOv5 model is proposed in this paper. Firstly, this research proposes a dynamic contextual attention module, which uses feature scores to dynamically assign feature weights and output feature information through channel dimensions to improve the model's attention to small target feature information and increase the network's ability to extract contextual information; secondly, this research designs a spatial gating filtering multi-directional weighted fusion module, which uses spatial filtering and weighted bidirectional fusion in the multi-scale fusion stage to improve the characterization of weak targets, reduce the interference of redundant information, and better adapt to the detection of weak targets in images under unmanned aerial vehicle remote sensing aerial photography; meanwhile, using Normalized Wasserstein Distance and CIoU regression loss function, the similarity metric value of the regression frame is obtained by modeling the Gaussian distribution of the regression frame, which increases the smoothing of the positional difference of the small targets and solves the problem that the positional deviation of the small targets is very sensitive, so that the model's detection accuracy of the small targets is effectively improved. This paper trains and tests the model on the VisDrone2021 and AI-TOD datasets. This study used the NWPU-RESISC dataset for visual detection validation. The experimental results show that ASG-YOLOv5 has a better detection effect in unmanned aerial vehicle remote sensing aerial images, and the frames per second (FPS) reaches 86, which meets the requirement of real-time small target detection, and it can be better adapted to the detection of the weak and small targets in the aerial image dataset, and ASG-YOLOv5 outperforms many existing target detection methods, and its detection accuracy reaches 21.1% mAP value. The mAP values are improved by 2.9% and 1.4%, respectively, compared with the YOLOv5 model. The project is available at https://github.com/woaini-shw/asg-yolov5.git.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Images taken by the UAV show different problems of weak target detection.
a) the problem of target occlusion in the images taken by the UAV; b) the problem of target recognition arising from the images taken by the UAV in different environments; c) the factors such as blurring and exposure of the images taken by the UAV at different speeds of motion affect target detection; d) the problems of different scales and many dense types of targets arising from images taken by UAVs flying at different altitudes. Fig 1 is attributed to the NWPU-RESISC database and are available from the NWPU-RESISC database (url(s) https://tensorflow.google.cn/datasets/catalog/resisc45).
Fig 2
Fig 2. Detailed module diagram of Spatial Pyramid Pooling—Fast (SPPF).
The CBS module denotes the convolution, the BN layer, and the SiLU activation function.
Fig 3
Fig 3
(a) ASG-YOLOv5 model structure. (b) C3 module. (c) Consists of a C3 layer, BiFPN fusion, upsampling operation, and a convolutional layer. (d) Consists of a convolutional layer, BiFPN fusion, and a C3 layer. Adding DCA module at the end with CSPDarknet53 of base model YOLOv5 as the backbone; introducing a new spatially gated filtered multi-directionally weighted fusion module, SGM, with PAFPN similar to base model YOLOv5 as the neck module.
Fig 4
Fig 4. Detailed diagram of dynamic contextual attention module (DCA).
Input feature x goes through a global contextual information extraction structure containing feature similarity score and information bottleneck structure to get global information; the global information is fused with the original input feature information, and the weights of different channels are recalibrated to adjust the channel dependency, and the obtained feature weights of different channels are multiplicatively weighted and fused with the input features.
Fig 5
Fig 5. Detailed schematic of the Squeeze-and-Excitation module (SE).
The input feature x is residual-connected, and a part of it is dimensionally compressed using global average pooling. It goes through two fully connected layers to predict each channel and get the importance of different channels. Then, a normalization operation is performed using the sigmoid activation function, and then feature fusion is performed with the other part of the input features.
Fig 6
Fig 6. Spatial Gating Filtering Multi-directional Weighted Fusion Module (SGM) module structure.
Simupsample is Content Aware Reassembly of Features (CARAFE), the CBS module consists of Conv layer, BN layer, Silu layer, SG is a spatial gating unit, GAU denotes Global Attention UpSample module, and the BiFPN-Concat module is the weighted feature fusion module.
Fig 7
Fig 7. Detailed schematic of the Global Attention Upsample (GAU) module.
Pi denotes the shallow feature layer, and Pi+1 denotes the deep feature layer.
Fig 8
Fig 8. Confusion matrix for the YOLOv5 model on the VisDrone2021 dataset.
It contains ten categories of the VisDrone2021 dataset. In the matrix, "background FP" signifies the instances where the model missed detecting non-background category target objects. In contrast, "background FN" indicates the occurrences where the model falsely detected category target objects that were not present.
Fig 9
Fig 9. Confusion matrix for the YOLOv7 model on the VisDrone2021 dataset.
Fig 10
Fig 10. Confusion matrix for the ASG-YOLOv5 model on the VisDrone2021 dataset.
Fig 11
Fig 11. Comparison of ASG-YOLOv5 and YOLOv5 visualizations for detecting various types of weak targets on the NWPU-RESISC dataset, where the targets are identified by the different colors of the marked boxes in different categories in the figure.
a. denotes the graph of the visualization effect of the ASG-YOLOv5 model for the UAV remote sensing captured images; b. denotes the graph of the visualization effect of the YOLOv5 model for the UAV remote sensing shooting picture visualization effect diagram. Fig 11 is attributed to the NWPU-RESISC database and are available from the NWPU-RESISC database (url(s) https://tensorflow.google.cn/datasets/catalog/resisc45).
Fig 12
Fig 12. Comparison plots of loss curves for different hyperparameter values of the NWD-CIoU loss function on the VisDrone2021 dataset.
Fig 13
Fig 13. Comparison plots of loss curves for different hyperparameter values of the NWD-CIoU loss function on the AI-TOD dataset.

Similar articles

Cited by

References

    1. Haoyang Zhang, Ying Wang, Feras Dayoub, and Niko Sunderhauf. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8514–8523,2021.
    1. Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Gaofeng Meng, et al: Feedback-driven data provider for object detection. arXiv e-prints, pp. arXiv-2004,2020.
    1. Zhu, Xingkui, et al. "TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
    1. Wang, Jinwang, et al. "A normalized Gaussian Wasserstein distance for tiny object detection." arXiv preprint arXiv:2110.13389, 2021.
    1. Zhaohui Zheng, Ping Wang, Dongwei Ren, Wei Liu, Rongguang Ye, Qinghua Hu, et al. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv preprint arXiv:2005.03572,2020. - PubMed

MeSH terms