Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 7;19(19):4332.
doi: 10.3390/s19194332.

Airborne Visual Detection and Tracking of Cooperative UAVs Exploiting Deep Learning

Affiliations

Airborne Visual Detection and Tracking of Cooperative UAVs Exploiting Deep Learning

Roberto Opromolla et al. Sensors (Basel). .

Abstract

The performance achievable by using Unmanned Aerial Vehicles (UAVs) for a large variety of civil and military applications, as well as the extent of applicable mission scenarios, can significantly benefit from the exploitation of formations of vehicles able to fly in a coordinated manner (swarms). In this respect, visual cameras represent a key instrument to enable coordination by giving each UAV the capability to visually monitor the other members of the formation. Hence, a related technological challenge is the development of robust solutions to detect and track cooperative targets through a sequence of frames. In this framework, this paper proposes an innovative approach to carry out this task based on deep learning. Specifically, the You Only Look Once (YOLO) object detection system is integrated within an original processing architecture in which the machine-vision algorithms are aided by navigation hints available thanks to the cooperative nature of the formation. An experimental flight test campaign, involving formations of two multirotor UAVs, is conducted to collect a database of images suitable to assess the performance of the proposed approach. Results demonstrate high-level accuracy, and robustness against challenging conditions in terms of illumination, background and target-range variability.

Keywords: UAV swarms; YOLO; deep learning; machine vision; unmanned aerial vehicles; visual detection; visual tracking.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
State diagram of the proposed architecture. Two cases are possible for the state of the system, namely target detected (1) or not detected (0). This relatively simple architecture is justified by the cooperative nature of the assumed multi-UAV system.
Figure 2
Figure 2
Scheme summarizing the algorithmic strategy characterizing the proposed DL-based detector. The input parameters are listed within a black (dashed) rectangular box. The processing blocks are enclosed within black rectangular boxes. The final output is highlighted in red.
Figure 3
Figure 3
DL-based detector: example of search windows definition for a 752 × 480-pixels RGB image. du is set to 150 pixels (Nw = 5). The target UAV position is highlighted by a black box.
Figure 4
Figure 4
DL-based detector: example of search windows definition for a 752 × 480-pixels RGB image. The target UAV position is highlighted by a black box. (a) du is set to 150 pixels (Nw = 5). The target UAV projection on the image plane is cut by the border between the second and third search window. (b) du is set to 100 pixels (Nw = 7). Only the third search window, which fully contains the target UAV, is highlighted for the sake of clarity.
Figure 5
Figure 5
Example of best YOLO detection. IoU = 0.538. The reference BB, obtained using a supervised approach, is approximately centered at the geometric center of the target. The detected BB is the output of the DL-based detector.
Figure 6
Figure 6
Main steps of the image processing approach to refine the detected BB. An image crop is obtained from the detected bounding box. The gradient operator is applied within this image portion and the gradient image is then binarized. Finally, the centroid of the set of pixels highlighted in the binarized image is computed and a refined BB is centered around this point.
Figure 7
Figure 7
Example of application of the BB refinement block. In this case, the factor c is 1. (a) Detected BB. (b) Result of gradient estimation. (c) Result of binarization and centroid calculation (highlighted by a red dot).
Figure 8
Figure 8
Result of the BB refinement algorithm. The IoU of the refined BB is 0.747.
Figure 9
Figure 9
Examples of prediction of the target UAV projection on the image plane (highlighted by a red dot) carried out by the DL-based tracker. The search area is drawn as a red square. The target UAV is enclosed in a black box. (a,b) Far range scenario (target range ≈ 116 m); du,tr = dv,tr = 150 pixels; prediction error ≈ 45 pixels. (c,d) Close range scenario (target range ≈ 20 m); du,tr = dv,tr = 300 pixels; prediction error ≈ 20 pixels.
Figure 10
Figure 10
UAVs exploited for the flight test campaign. (a) Tracker UAV for database A: customized Pelican by Ascending Technologies. (b) Target UAV for database A: customized X8+ by 3D Robotics. (c) Target and tracker UAV for database B: customized M100 by DJI.
Figure 11
Figure 11
Example of images from FT3-A. The target (i.e., the X8+ octocopter) occupies a few pixels as highlighted by the zoom on the right side of each figure. (a,b). Target below the horizon. (c,d) Target above the horizon hindered by clouds.
Figure 12
Figure 12
Example of images from FT1-B (a) and FT2-B (b).
Figure 13
Figure 13
DL-based detector performance as a function of τdet. FT3-A composed of 381 frames. (a) Target UAV prediction enabled (du = 100 pixels, Nw = 7 search windows). (b) Target UAV prediction disabled (du = dv = 100 pixels, Nw = 28 search windows).
Figure 14
Figure 14
(a) Detection and tracking test on FT1-B (1330 images). Distribution of Smax as a function of the target-tracker relative distance. (b) Histogram providing the distribution of the target-tracker range characterizing the 650 images selected from FT1-A and FT2-A.
Figure 15
Figure 15
Variation of the target-chaser relative distance (blue line) during the FT1-B (1330 images). The DL-based detector and tracker are applied setting τdet to 0.20 and τtr to 0.075. The frames where the algorithmic architecture provides correct detections are highlighted with red (target inside the FOV) and green (target outside the FOV) stars.

References

    1. Se S., Firoozfam P., Goldstein N., Wu L., Dutkiewicz M., Pace P., Pierre Naud J.L. Automated UAV-based mapping for airborne reconnaissance and video exploitation; Proceedings of the SPIE 7307; Orlando, FL, USA. 28 April 2009; pp. 73070M-1–73070M-7.
    1. Gonçalves J.A., Henriques R. UAV photogrammetry for topographic monitoring of coastal areas. ISPRS J. Photogramm. Remote Sens. 2015;104:101–111. doi: 10.1016/j.isprsjprs.2015.02.009. - DOI
    1. Ham Y., Han K.K., Lin J.J., Golparvar-Fard M. Visual monitoring of civil infrastructure systems via camera-equipped Unmanned Aerial Vehicles (UAVs): A review of related works. Vis. Eng. 2016;4:1–8. doi: 10.1186/s40327-015-0029-z. - DOI
    1. Qi J., Song D., Shang H., Wang N., Hua C., Wu C., Qi X., Han J. Search and rescue rotary-wing uav and its application to the lushan ms 7.0 earthquake. J. Field Rob. 2016;33:290–321. doi: 10.1002/rob.21615. - DOI
    1. Maes W.H., Steppe K. Perspectives for remote sensing with unmanned aerial vehicles in precision agriculture. Trends Plant Sci. 2019;24:152–164. doi: 10.1016/j.tplants.2018.11.007. - DOI - PubMed

LinkOut - more resources