Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 20;23(10):4925.
doi: 10.3390/s23104925.

Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5

Affiliations

Object Detection of Flexible Objects with Arbitrary Orientation Based on Rotation-Adaptive YOLOv5

Jiajun Wu et al. Sensors (Basel). .

Abstract

It is challenging to accurately detect flexible objects with arbitrary orientation from monitoring images in power grid maintenance and inspection sites. This is because these images exhibit a significant imbalance between the foreground and background, which can lead to low detection accuracy when using a horizontal bounding box (HBB) as the detector in general object detection algorithms. Existing multi-oriented detection algorithms that use irregular polygons as the detector can improve accuracy to some extent, but their accuracy is limited due to boundary problems during the training process. This paper proposes a rotation-adaptive YOLOv5 (R_YOLOv5) with a rotated bounding box (RBB) to detect flexible objects with arbitrary orientation, effectively addressing the above issues and achieving high accuracy. Firstly, a long-side representation method is used to add the degree of freedom (DOF) for bounding boxes, enabling accurate detection of flexible objects with large spans, deformable shapes, and small foreground-to-background ratios. Furthermore, the further boundary problem induced by the proposed bounding box strategy is overcome by using classification discretization and symmetric function mapping methods. Finally, the loss function is optimized to ensure training convergence for the new bounding box. To meet various practical requirements, we propose four models with different scales based on YOLOv5, namely R_YOLOv5s, R_YOLOv5m, R_YOLOv5l, and R_YOLOv5x. Experimental results demonstrate that these four models achieve mean average precision (mAP) values of 0.712, 0.731, 0.736, and 0.745 on the DOTA-v1.5 dataset and 0.579, 0.629, 0.689, and 0.713 on our self-built FO dataset, exhibiting higher recognition accuracy and a stronger generalization ability. Among them, R_YOLOv5x achieves a mAP that is about 6.84% higher than ReDet on the DOTAv-1.5 dataset and at least 2% higher than the original YOLOv5 model on the FO dataset.

Keywords: YOLOv5; flexible objects with arbitrary orientation; object detection; power grid maintenance and inspection site.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Comparison of the actual scene detection result between HBB and RBB.
Figure 2
Figure 2
Two popular representations of rotated bounding boxes. (a) OpenCV representation. (b) Long-side representation.
Figure 3
Figure 3
Boundary problems with OpenCV and long-side representation. (a) Boundary problems of OpenCV representation. (b) Boundary problems of long-side representation.
Figure 4
Figure 4
Overall structure of the proposed R_YOLOv5 network.
Figure 5
Figure 5
The process of training rectangular box coordinates, category confidence, class probability (λ), and 180-dimensional category angle probability(θ).
Figure 6
Figure 6
Gaussian function mapping.
Figure 7
Figure 7
Inference process.
Figure 8
Figure 8
Visualization results of the R_YOLOv5 evaluation.
Figure 9
Figure 9
Comparison of PR curves between R_YOLOv5 and YOLOv5 on the FO dataset.
Figure 10
Figure 10
Comparison of detection results before and after YOLOv5 changes. The blue rectangle on the left is the detection effect of HBB, and the green rectangle on the right is the detection effect of RBB. It shows that the improved RBB not only can detect the seine accurately but also can provide the orientation information of the seine.

Similar articles

References

    1. Xie T., Wang K., Li R., Tang X., Zhao L. Panet: A pixel-level attention network for 6d pose estimation with embedding vector features. IEEE Robot. Autom. Lett. 2021;7:1840–1847. doi: 10.1109/LRA.2021.3136873. - DOI
    1. Liu K., Peng L., Tang S. Underwater Object Detection Using TC-YOLO with Attention Mechanisms. Sensors. 2023;23:2567. doi: 10.3390/s23052567. - DOI - PMC - PubMed
    1. Wu Y., Li J. YOLOv4 with Deformable-Embedding-Transformer Feature Extractor for Exact Object Detection in Aerial Imagery. Sensors. 2023;23:2522. doi: 10.3390/s23052522. - DOI - PMC - PubMed
    1. Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Columbus, OH, USA. 23–28 June 2014; pp. 580–587.
    1. He K., Zhang X., Ren S., Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015;37:1904–1916. doi: 10.1109/TPAMI.2015.2389824. - DOI - PubMed

LinkOut - more resources