Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 22;23(10):4961.
doi: 10.3390/s23104961.

Corner-Point and Foreground-Area IoU Loss: Better Localization of Small Objects in Bounding Box Regression

Affiliations

Corner-Point and Foreground-Area IoU Loss: Better Localization of Small Objects in Bounding Box Regression

Delong Cai et al. Sensors (Basel). .

Abstract

Bounding box regression is a crucial step in object detection, directly affecting the localization performance of the detected objects. Especially in small object detection, an excellent bounding box regression loss can significantly alleviate the problem of missing small objects. However, there are two major problems with the broad Intersection over Union (IoU) losses, also known as Broad IoU losses (BIoU losses) in bounding box regression: (i) BIoU losses cannot provide more effective fitting information for predicted boxes as they approach the target box, resulting in slow convergence and inaccurate regression results; (ii) most localization loss functions do not fully utilize the spatial information of the target, namely the target's foreground area, during the fitting process. Therefore, this paper proposes the Corner-point and Foreground-area IoU loss (CFIoU loss) function by delving into the potential for bounding box regression losses to overcome these issues. First, we use the normalized corner point distance between the two boxes instead of the normalized center-point distance used in the BIoU losses, which effectively suppresses the problem of BIoU losses degrading to IoU loss when the two boxes are close. Second, we add adaptive target information to the loss function to provide richer target information to optimize the bounding box regression process, especially for small object detection. Finally, we conducted simulation experiments on bounding box regression to validate our hypothesis. At the same time, we conducted quantitative comparisons of the current mainstream BIoU losses and our proposed CFIoU loss on the small object public datasets VisDrone2019 and SODA-D using the latest anchor-based YOLOv5 and anchor-free YOLOv8 object detection algorithms. The experimental results demonstrate that YOLOv5s (+3.12% Recall, +2.73% mAP@0.5, and +1.91% mAP@0.5:0.95) and YOLOv8s (+1.72% Recall and +0.60% mAP@0.5), both incorporating the CFIoU loss, achieved the highest performance improvement on the VisDrone2019 test set. Similarly, YOLOv5s (+6% Recall, +13.08% mAP@0.5, and +14.29% mAP@0.5:0.95) and YOLOv8s (+3.36% Recall, +3.66% mAP@0.5, and +4.05% mAP@0.5:0.95), both incorporating the CFIoU loss, also achieved the highest performance improvement on the SODA-D test set. These results indicate the effectiveness and superiority of the CFIoU loss in small object detection. Additionally, we conducted comparative experiments by fusing the CFIoU loss and the BIoU loss with the SSD algorithm, which is not proficient in small object detection. The experimental results demonstrate that the SSD algorithm incorporating the CFIoU loss achieved the highest improvement in the AP (+5.59%) and AP75 (+5.37%) metrics, indicating that the CFIoU loss can also improve the performance of algorithms that are not proficient in small object detection.

Keywords: bounding box regression; loss function; object detection; small object.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Three position relationships between the predicted box B and the target box GT during the regression process, along with their corresponding loss values for bounding box regression. (ac) represent the three position relationships between the predicted box and the target box. The red value represents the BIoU loss value degenerating to the IoU loss value when the predicted box approaches the target box. The green value represents the CFIoU loss value. By comparing the results, it can be seen that CFIoU loss does not suffer from the same degeneration issue as BIoU loss.
Figure 2
Figure 2
Left: The relative positions of predicted boxes A and B with respect to the target box GT. Right: The values of BIoU losses and CFIoU loss in cases (a,b). In (a), predicted box A is inside the target box GT, while in (b), predicted box B is outside the target box GT. The fitting effect of predicted box A is better than that of predicted box B. Therefore, the loss function should impose a greater penalty on predicted box B to improve its regression performance. Unfortunately, BIoU losses cannot distinguish the regression situation of the predicted boxes in (a,b), as their loss values are equal in these two cases. In contrast, the CFIoU loss results in a larger loss value for predicted box B, which is better.
Figure 3
Figure 3
CFIoU Loss for Bounding Box Regression.
Figure 4
Figure 4
Visualization of the bounding box regression process. (a): The process where the predicted box fits to the target box in Figure 1a. (b): The process where the predicted box fits to the target box in Figure 1b. The red box represents the initial position of the predicted box, the blue box represents the position of the target box, and the green box represents the predicted box during the fitting process.
Figure 5
Figure 5
The process of minimizing various losses during the fitting of the two types of predicted boxes in Figure 2a,b to the target box.
Figure 6
Figure 6
Simulation experiment for bounding box regression: (a) 62,500 regression cases were used by considering different distances, scales and aspect ratios; (b) total regression error.
Figure 7
Figure 7
Detection examples using YOLOv5s trained on the VisDrone2019 dataset. Visualization samples are chosen from VisDrone2019-DET-test-challenge. (a,b): Left: 𝓛EIoU, right: 𝓛CFIoU.
Figure 8
Figure 8
Detection examples using YOLOv5s trained on the VisDrone2019 dataset. Visualization samples are chosen from VisDrone2019-DET-test-challenge. (a,b): Left: 𝓛IoU, right: 𝓛CFIoU.

Similar articles

Cited by

References

    1. Ren S., He K., Girshick R., Sun J. Advances in Neural Information Processing Systems 28, Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015. Curran Associates, Inc.; Red Hook, NY, USA: 2015. Faster R-CNN: Towards real-time object detection with region proposal networks.
    1. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., Berg A.C. Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. Springer; Berlin/Heidelberg, Germany: 2016. Ssd: Single shot multibox detector; pp. 21–37. Part I 14.
    1. Ultralytics YOLOv5. [(accessed on 20 January 2023)]. Available online: https://github.com/ultralytics/yolov5.
    1. Law H., Deng J. CornerNet: Detecting objects as paired keypoints; Proceedings of the European Conference on Computer Vision (ECCV); Munich, Germany. 8–14 September 2018; pp. 734–750.
    1. Tian Z., Shen C., Chen H., He T. FCOS: Fully convolutional one-stage object detection; Proceedings of the IEEE/CVF International Conference on Computer Vision; Seoul, Republic of Korea. 20–26 October 2019; pp. 9627–9636.

LinkOut - more resources