Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 8;14(1):10567.
doi: 10.1038/s41598-024-61016-3.

A lightweight network model designed for alligator gar detection

Affiliations

A lightweight network model designed for alligator gar detection

Xin Wang et al. Sci Rep. .

Abstract

When using advanced detection algorithms to monitor alligator gar in real-time in wild waters, the efficiency of existing detection algorithms is subject to certain limitations due to turbid water quality, poor underwater lighting conditions, and obstruction by other objects. In order to solve this problem, we developed a lightweight real-time detection network model called ARD-Net, from the perspective of reducing the amount of calculation and obtaining more feature map patterns. We introduced a cross-domain grid matching strategy to accelerate network convergence, and combined the involution operator and dual-channel attention mechanism to build a more lightweight feature extractor and multi-scale detection reasoning network module to enhance the network's response to different semantics. Compared with the yoloV5 baseline model, our method performs equivalently in terms of detection accuracy, but the model is smaller, the detection speed is increased by 1.48 times, When compared with the latest State-of-the-Art (SOTA) method, YOLOv8, our method demonstrates clear advantages in both detection efficiency and model size,and has good real-time performance. Additionally, we created a dataset of alligator gar images for training.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Alligator gar target detection example.
Figure 2
Figure 2
Illustrates a comparative chart, showcasing the mean Average Precision (mAP) and Frames Per Second (FPS) metrics for various detection methods.
Figure 3
Figure 3
Schematic diagram of the ARD-Net network model structure.
Figure 4
Figure 4
Target box regression and cross-domain grid matching.
Algorithm 1
Algorithm 1
Involution in a PyTorch-like style
Figure 5
Figure 5
A dual-channel attention mechanism (CBAM).
Figure 6
Figure 6
Illustrates the structure of the multi-scale detection and reasoning network. In this figure, CBAM is depicted as a dual-channel attention module, while Involution is represented as an inner convolution layer. The numerical values, such as 256 and 512, indicate the number of feature data channels in each respective layer.
Figure 7
Figure 7
The feature extractor introduced in this study.
Figure 8
Figure 8
Illustrates the schematic diagram of the minimum bounding box in GIoU loss.
Figure 9
Figure 9
Schematic diagram of DIoU Loss principle.
Figure 10
Figure 10
Examples before and after manual review and proofreading of samples.
Figure 11
Figure 11
A sample derived from our alligator gar target detection dataset.
Figure 12
Figure 12
Schematic diagram of the composition of the alligator gar data set. offering a comprehensive overview of its structural characteristics: (a) Categorized based on living environments: Approximately 34% of the images depict scenes from artificial breeding environments, while the remaining 66% capture images from natural, wild settings. (b) Classified according to the quantity of alligator gar present in each image: The distribution reveals that 73% of the images feature a solitary alligator gar, whereas 27% showcase multiple individuals of the species.
Figure 13
Figure 13
Illustrates the trend of mean values for several loss functions over the course of 300 rounds of training using the method applied to the PASCAL VOC dataset.
Figure 14
Figure 14
Provides an example illustrating the effectiveness of the method described in this article for detecting Alligator Gar targets. On the left side are the detection results of this method, while on the right side is the inference effect of YOLOv5.
Figure 15
Figure 15
Displays the Precision–Recall (P–R) curve resulting from 300 rounds of iterative training of the method on the PASCAL VOC dataset.
Figure 16
Figure 16
Examples of test results of this method on the PASCAL VOC dataset.

Similar articles

References

    1. Lamson H, Cairns D, Shiao J-C, Iizuka Y, Tzeng W-N. American eel, Anguilla rostrata, growth in fresh and salt water: Implications for conservation and aquaculture. Fish. Manag. Ecol. 2009;16:306–314. doi: 10.1111/j.1365-2400.2009.00677.x. - DOI
    1. Liu SA. Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Am. 1996;100:3417–3430. doi: 10.1121/1.416983. - DOI
    1. Teutsch, M. & Kruger, W. Robust and fast detection of moving vehicles in aerial videos using sliding windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 26–34 (2015).
    1. Boykov Y, Funka-Lea G. Graph cuts and efficient ND image segmentation. Int. J. Comput. Vis. 2006;70:109–131. doi: 10.1007/s11263-006-7934-5. - DOI
    1. Iscen, A., Tolias, G., Avrithis, Y. & Chum, O. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5070–5079 (2019).