YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Selection in Aerial Imagery

Alessandro Betti¹, Mauro Tucci²

Affiliations

¹ FlySight srl, via A. Lampredi 45, 57121 Livorno, Italy.
² Department of Energy, Systems, Territory and Construction Engineering, University of Pisa, 56122 Pisa, Italy.

PMID: 36850465
PMCID: PMC9962614
DOI: 10.3390/s23041865

YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Selection in Aerial Imagery

Alessandro Betti et al. Sensors (Basel). 2023.

. 2023 Feb 7;23(4):1865.

doi: 10.3390/s23041865.

Authors

Alessandro Betti¹, Mauro Tucci²

Affiliations

¹ FlySight srl, via A. Lampredi 45, 57121 Livorno, Italy.
² Department of Energy, Systems, Territory and Construction Engineering, University of Pisa, 56122 Pisa, Italy.

PMID: 36850465
PMCID: PMC9962614
DOI: 10.3390/s23041865

Abstract

Small target detection is still a challenging task, especially when looking at fast and accurate solutions for mobile or edge applications. In this work, we present YOLO-S, a simple, fast, and efficient network. It exploits a small feature extractor, as well as skip connection, via both bypass and concatenation, and a reshape-passthrough layer to promote feature reuse across network and combine low-level positional information with more meaningful high-level information. Performances are evaluated on AIRES, a novel dataset acquired in Europe, and VEDAI, benchmarking the proposed YOLO-S architecture with four baselines. We also demonstrate that a transitional learning task over a combined dataset based on DOTAv2 and VEDAI can enhance the overall accuracy with respect to more general features transferred from COCO data. YOLO-S is from 25% to 50% faster than YOLOv3 and only 15-25% slower than Tiny-YOLOv3, outperforming also YOLOv3 by a 15% in terms of accuracy (mAP) on the VEDAI dataset. Simulations on SARD dataset also prove its suitability for search and rescue operations. In addition, YOLO-S has roughly 90% of Tiny-YOLOv3's parameters and one half FLOPs of YOLOv3, making possible the deployment for low-power industrial applications.

Keywords: aerial imagery; computer vision; convolutional neural network; feature fusion; reshape pass-through layer; vehicle detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Some images of the AIRES dataset: (a,b) have been collected in Italy, whereas (c,d) in Norway. The vehicles are delimited by GT bounding boxes.

**Figure 2**
GT objects for (a) AIRES, (b) VEDAI and (c) SARD datasets. **Left**: histogram of small, medium and large objects grouped by class according to COCO convention [37]. **Right**: box plot with 25th, 50th and 75th percentiles of target area for each class as a percentage of the image size. The mean area is shown as a black circle.

**Figure 3**
Two-dimensional density plot of GT objects in the plane (GT width, GT height) for (a) AIRES, (b) VEDAI and (c) SARD datasets. GT size is normalized by image size. Image shape is 1920 × 1080 for (a,c) and 1024 × 1024 for (b).

**Figure 4**
Overview of the proposed networks: (a) YOLO-L; (b) YOLO-S; (c) Reshape–Passthrough layer. In the figure, $C = 80$ classes have been assumed (dataset COCO [37]).

**Figure 5**
Workflow of the proposed vehicle detection approach for experiment 1 on AIRES dataset.

**Figure 6**
Experiment 1 on AIRES. (a) YOLOv3; (b) Tiny-YOLOv3; (c) [3], (d) YOLO-L; (e) YOLO-S. The following backgrounds are considered: (i) rural (Italy); (j–l) urban (Norway). Green (red) box denotes the true (false) positives, whereas the ground truth box is depicted as a light blue box if detection is correct and as a yellow box if a false negative occurs.

**Figure 7**
Experiments 1 (odd columns) and 2 (even columns) on VEDAI dataset. Comparison of different CNNs: (a) YOLOv3; (b) Tiny-YOLOv3; (c) CNN by [3]; (d) YOLO-L; (e) YOLO-S.

**Figure 8**
Experiment on SARD dataset. Comparison of different CNNs: (a) YOLOv3; (b) Tiny-YOLOv3; (c) CNN by [3]; (d) YOLO-L; (e) YOLO-S, on five different images (i–m).

**Figure 9**
Performance summary of the different networks on AIRES, VEDAI and SARD datasets as a function of FPS. The marker size is proportional to the BFLOPs required by the network. For a better readability, FPS is normalized by Tiny-YOLOv3’s speed [3]. The model denoted as M. Ju et al. can be found in [3].

See this image and copyright information in PMC

References

1. Qu T., Zhang Q., Sun S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimed. Tools Appl. 2017;76:21651–21663. doi: 10.1007/s11042-016-4043-5. - DOI
1. Tang T., Zhou S., Deng Z., Zou H., Lei L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors. 2017;17:336. doi: 10.3390/s17020336. - DOI - PMC - PubMed
1. Ju M., Luo J., Zhang P., He M., Luo H. A Simple and Efficient Network for Small Target Detection. IEEE Access. 2019;7:85771–85781. doi: 10.1109/ACCESS.2019.2924960. - DOI
1. Sambolek S., Ivasic-Kos M. Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors. IEEE Access. 2021;9:37905–37922. doi: 10.1109/ACCESS.2021.3063681. - DOI
1. Razakarivony S., Jurie F. Vehicle Detection in Aerial Imagery: A small target detection benchmark. J. Vis. Commun. Image Represent. 2016;34:187–203. doi: 10.1016/j.jvcir.2015.11.002. - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Selection in Aerial Imagery

Affiliations

YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Selection in Aerial Imagery

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources