Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 7;23(4):1865.
doi: 10.3390/s23041865.

YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Selection in Aerial Imagery

Affiliations

YOLO-S: A Lightweight and Accurate YOLO-like Network for Small Target Selection in Aerial Imagery

Alessandro Betti et al. Sensors (Basel). .

Abstract

Small target detection is still a challenging task, especially when looking at fast and accurate solutions for mobile or edge applications. In this work, we present YOLO-S, a simple, fast, and efficient network. It exploits a small feature extractor, as well as skip connection, via both bypass and concatenation, and a reshape-passthrough layer to promote feature reuse across network and combine low-level positional information with more meaningful high-level information. Performances are evaluated on AIRES, a novel dataset acquired in Europe, and VEDAI, benchmarking the proposed YOLO-S architecture with four baselines. We also demonstrate that a transitional learning task over a combined dataset based on DOTAv2 and VEDAI can enhance the overall accuracy with respect to more general features transferred from COCO data. YOLO-S is from 25% to 50% faster than YOLOv3 and only 15-25% slower than Tiny-YOLOv3, outperforming also YOLOv3 by a 15% in terms of accuracy (mAP) on the VEDAI dataset. Simulations on SARD dataset also prove its suitability for search and rescue operations. In addition, YOLO-S has roughly 90% of Tiny-YOLOv3's parameters and one half FLOPs of YOLOv3, making possible the deployment for low-power industrial applications.

Keywords: aerial imagery; computer vision; convolutional neural network; feature fusion; reshape pass-through layer; vehicle detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Some images of the AIRES dataset: (a,b) have been collected in Italy, whereas (c,d) in Norway. The vehicles are delimited by GT bounding boxes.
Figure 2
Figure 2
GT objects for (a) AIRES, (b) VEDAI and (c) SARD datasets. Left: histogram of small, medium and large objects grouped by class according to COCO convention [37]. Right: box plot with 25th, 50th and 75th percentiles of target area for each class as a percentage of the image size. The mean area is shown as a black circle.
Figure 3
Figure 3
Two-dimensional density plot of GT objects in the plane (GT width, GT height) for (a) AIRES, (b) VEDAI and (c) SARD datasets. GT size is normalized by image size. Image shape is 1920 × 1080 for (a,c) and 1024 × 1024 for (b).
Figure 4
Figure 4
Overview of the proposed networks: (a) YOLO-L; (b) YOLO-S; (c) Reshape–Passthrough layer. In the figure, C=80 classes have been assumed (dataset COCO [37]).
Figure 5
Figure 5
Workflow of the proposed vehicle detection approach for experiment 1 on AIRES dataset.
Figure 6
Figure 6
Experiment 1 on AIRES. (a) YOLOv3; (b) Tiny-YOLOv3; (c) [3], (d) YOLO-L; (e) YOLO-S. The following backgrounds are considered: (i) rural (Italy); (jl) urban (Norway). Green (red) box denotes the true (false) positives, whereas the ground truth box is depicted as a light blue box if detection is correct and as a yellow box if a false negative occurs.
Figure 7
Figure 7
Experiments 1 (odd columns) and 2 (even columns) on VEDAI dataset. Comparison of different CNNs: (a) YOLOv3; (b) Tiny-YOLOv3; (c) CNN by [3]; (d) YOLO-L; (e) YOLO-S.
Figure 8
Figure 8
Experiment on SARD dataset. Comparison of different CNNs: (a) YOLOv3; (b) Tiny-YOLOv3; (c) CNN by [3]; (d) YOLO-L; (e) YOLO-S, on five different images (im).
Figure 9
Figure 9
Performance summary of the different networks on AIRES, VEDAI and SARD datasets as a function of FPS. The marker size is proportional to the BFLOPs required by the network. For a better readability, FPS is normalized by Tiny-YOLOv3’s speed [3]. The model denoted as M. Ju et al. can be found in [3].

References

    1. Qu T., Zhang Q., Sun S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimed. Tools Appl. 2017;76:21651–21663. doi: 10.1007/s11042-016-4043-5. - DOI
    1. Tang T., Zhou S., Deng Z., Zou H., Lei L. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors. 2017;17:336. doi: 10.3390/s17020336. - DOI - PMC - PubMed
    1. Ju M., Luo J., Zhang P., He M., Luo H. A Simple and Efficient Network for Small Target Detection. IEEE Access. 2019;7:85771–85781. doi: 10.1109/ACCESS.2019.2924960. - DOI
    1. Sambolek S., Ivasic-Kos M. Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors. IEEE Access. 2021;9:37905–37922. doi: 10.1109/ACCESS.2021.3063681. - DOI
    1. Razakarivony S., Jurie F. Vehicle Detection in Aerial Imagery: A small target detection benchmark. J. Vis. Commun. Image Represent. 2016;34:187–203. doi: 10.1016/j.jvcir.2015.11.002. - DOI

LinkOut - more resources