Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 1;20(4):e0320315.
doi: 10.1371/journal.pone.0320315. eCollection 2025.

A lightweight trichosanthes kirilowii maxim detection algorithm in complex mountain environments based on improved YOLOv7-tiny

Affiliations

A lightweight trichosanthes kirilowii maxim detection algorithm in complex mountain environments based on improved YOLOv7-tiny

Zhongjian Xie et al. PLoS One. .

Abstract

Detecting Trichosanthes Kirilowii Maxim (Cucurbitaceae) in complex mountain environments is critical for developing automated harvesting systems. However, the environmental characteristics of brightness variation, inter-plant occlusion, and motion-induced blurring during harvesting operations, detection algorithms face excessive parameters and high computational intensity. Accordingly, this study proposes a lightweight T.Kirilowii detection algorithm for complex mountainous environments based on YOLOv7-tiny, named KPD-YOLOv7-GD. Firstly, improve the multi-scale feature layer and reduce the complexity of the model. Secondly, a lightweight convolutional module is introduced to replace the standard convolutions in the Efficient Long-range Aggregation Network (ELAN-A) module, and the channel pruning techniques are applied to further decrease the model's complexity. Finally, the experiment significantly enhanced the efficiency of feature extraction and the detection accuracy of the model algorithm through the integration of the Dynamic Head (DyHead) module, the Content-Aware Re-Assembly of Features (CARAFE) module, and the incorporation of knowledge distillation techniques. The experimental results showed that the mean average precision of the improved network KPD-YOLOv7-GD reached 93.2%. Benchmarked against mainstream single-stage algorithms (YOLOv3-tiny, YOLOv5s, YOLOv6s, YOLOv7-tiny, and YOLOv8), KPD-YOLOv7-GD demonstrated mean average precision improvements of 4.8%, 0.6%, 3.0%, 0.6%, and 0.2% with corresponding model compression rates of 81.6%, 68.8%, 88.9%, 63.4%, and 27.4%, respectively. Compared with similar studies, KPD-YOLOv7-GD exhibits lower complexity and higher recognition speed accuracy, making it more suitable for resource-constrained T.kirilowii harvesting robots.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Images of T.
Kirilowii under various lighting conditions in different mountainous terrains (a) normal light (b) backlight (c) leaf obstruction (d) clusters.
Fig 2.
Fig 2.. Data augmentation methods.
(a) normal light (b) Random brightness; (c) Random Gaussian blurring.
Fig 3.
Fig 3.. Structure of the YOLOv7-tiny network.
The backbone network comprises layers of CBS convolution, ELAN-A, and MPConv convolution. Although the ELAN-A layer enhances the network’s feature extraction speed and simplifies the backbone structure of YOLOv7, the parameter and computational overhead remain considerable, making it unable to meet the demands of lightweight applications and small-scale devices.
Fig 4.
Fig 4.. KPD-YOLOv7-GD network structure diagram.
Fig 5.
Fig 5.. The width-to-height ratio of mountain T.
Kirilowii fruits within the sample images. Therefore, this experiment proposes adding a small object detection layer to enhance the base algorithm. The specific implementation method is as follows: Firstly, within the existing detection layers, an additional detection head of size 160 × 160 is introduced to achieve a 4 × 4pixel receptive field, enabling the capture of finer details for smaller objects. Next, a set of feature layers composed of CBS convolutional modules, upsampling layers, Concat concatenation layers, and an ELAN-A layer is added to the original model, inserted between the 17th and 18th layers. Therefore, the original algorithm’s three-scale layers are elevated to four scales: 80 × 80, 40 × 40, 20 × 20, and the newly added 160 × 160, as shown in the red dashed area in Fig.3. Finally, an experiment is demonstrated to determine the most suitable detection scale for T.Kirilowii. The improvement reduces model parameters and computational burden while minimizing the loss of shallow information for small objects. Consequently, it enhanced the small objects of T.Kirilowii recognition under occlusion.
Fig 6.
Fig 6.. General idea of DSConv.
The introduction of DSConv and GSConv reconstructed efficient remote aggregation network modules at different positions, which may result in variations in the model’s overall performance. Comparing the parameter count and computational complexity between regular convolution and GSConv reveals that GSConv can reduce redundant feature maps. When the computational complexity of the GSConv module is represented as FLOPs1 and the conventional convolution module is represented as FLOPs2, the relationship between their computational complexities can be expressed as shown in Equation (1).
Fig 7.
Fig 7.. Structure of GSConv network.
Fig 8.
Fig 8.. The overall framework of CARAFE.
Fig 9.
Fig 9.. DyHead Structure.
Fig 10.
Fig 10.. Channel pruning process.
After channel pruning, the algorithm model experiences a significant reduction in parameters and computational load, yet it may incur a decrease in accuracy. To compensate for the decrease in accuracy after pruning, this experiment adopts knowledge distillation techniques to enhance the recognition accuracy of the model algorithm. Knowledge distillation is a model compression technique that enhances the performance and accuracy of a smaller model (student model) by transferring knowledge from a larger model (teacher model). Due to the strong generalization and robustness of the YOLOv7 model, it will be used as the teacher model in this experiment. KPD-YOLOv7-GD serves as the student model, as illustrated in Fig.11.
Fig 11.
Fig 11.. Improved model distillation structure.
Fig 12.
Fig 12.. Training results of different detection scales (a) Comparison of the mean average precision values (b) Comparison of Loss values.
Fig 13.
Fig 13.. Comparative experiments with different pruning rates.
Fig 14.
Fig 14.. Comparison experiment of accuracy improvement.
Fig 15.
Fig 15.. Performance comparison of mainstream lightweight object detection algorithms.
Fig 16.
Fig 16.. Comparative visualization of feature maps optimized by algorithms at different levels.

References

    1. Song Q, Zhang K, Weng S. Extraction process, primary structure and hypoglycemic activity of polysaccharide from Trichosanthes kirilowii Maxim. seed. FINE CHEMICALS. 2023;41(01):137–46. doi: 10.13550/j.jxhg.20230487 - DOI
    1. Nam Y, Choi M, Hwang H, Lee M-G, Kwon B-M, Lee W-H, et al.. Natural flavone jaceosidin is a neuroinflammation inhibitor. Phytother Res. 2013;27(3):404–11. doi: 10.1002/ptr.4737 - DOI - PubMed
    1. McGovern PE, Christofidou-Solomidou M, Wang W, Dukes F, Davidson T, El-Deiry WS. Anticancer activity of botanical compounds in ancient fermented beverages (review). Int J Oncol. 2010;37(1):5–14. doi: 10.3892/ijo_00000647 - DOI - PubMed
    1. Xu Y, Imou K, Kaizu Y, Saga K. Two-stage approach for detecting slightly overlapping strawberries using HOG descriptor. Biosys Eng. 2013;115(2):144–53. doi: 10.1016/j.biosystemseng.2013.03.011 - DOI
    1. Arefi A, Motlagh AM, Mollazade K, Teimourlou RFJAJoCS. Recognition and localization of ripen tomato based on machine vision. 2011;5:1144-9.

LinkOut - more resources