. 2023 Apr 7;23(8):3810.

doi: 10.3390/s23083810.

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

R M Rasika D Abeyrathna^{1

2}, Victor Massaki Nakaguchi¹, Arkar Minn^{1

3}, Tofael Ahamed⁴

Affiliations

¹ Graduate School of Science and Technology, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8577, Japan.
² Department of Agricultural Engineering, University of Peradeniya, Kandy 20400, Sri Lanka.
³ Department of Agricultural Engineering, Yezin Agricultural University, Nay Phi Taw 150501, Myanmar.
⁴ Faculty of Life and Environmental Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8577, Japan.

PMID: 37112151
PMCID: PMC10145955
DOI: 10.3390/s23083810

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

R M Rasika D Abeyrathna et al. Sensors (Basel). 2023.

. 2023 Apr 7;23(8):3810.

doi: 10.3390/s23083810.

Authors

R M Rasika D Abeyrathna^{1

2}, Victor Massaki Nakaguchi¹, Arkar Minn^{1

3}, Tofael Ahamed⁴

Affiliations

¹ Graduate School of Science and Technology, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8577, Japan.
² Department of Agricultural Engineering, University of Peradeniya, Kandy 20400, Sri Lanka.
³ Department of Agricultural Engineering, Yezin Agricultural University, Nay Phi Taw 150501, Myanmar.
⁴ Faculty of Life and Environmental Sciences, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8577, Japan.

PMID: 37112151
PMCID: PMC10145955
DOI: 10.3390/s23083810

Abstract

Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to develop a recognition system based on training datasets from an augmented, complex apple orchard. The recognition system was evaluated using deep learning algorithms established from a convolutional neural network (CNN). The dynamic accuracy of the modern artificial neural networks involving 3D coordinates for deploying robotic arms at different forward-moving speeds from an experimental vehicle was investigated to compare the recognition and tracking localization accuracy. In this study, a Realsense D455 RGB-D camera was selected to acquire 3D coordinates of each detected and counted apple attached to artificial trees placed in the field to propose a specially designed structure for ease of robotic harvesting. A 3D camera, YOLO (You Only Look Once), YOLOv4, YOLOv5, YOLOv7, and EfficienDet state-of-the-art models were utilized for object detection. The Deep SORT algorithm was employed for tracking and counting detected apples using perpendicular, 15°, and 30° orientations. The 3D coordinates were obtained for each tracked apple when the on-board camera in the vehicle passed the reference line and was set in the middle of the image frame. To optimize harvesting at three different speeds (0.052 ms^-1, 0.069 ms^-1, and 0.098 ms^-1), the accuracy of 3D coordinates was compared for three forward-moving speeds and three camera angles (15°, 30°, and 90°). The mean average precision (mAP@0.5) values of YOLOv4, YOLOv5, YOLOv7, and EfficientDet were 0.84, 0.86, 0.905, and 0.775, respectively. The lowest root mean square error (RMSE) was 1.54 cm for the apples detected by EfficientDet at a 15° orientation and a speed of 0.098 ms^-1. In terms of counting apples, YOLOv5 and YOLOv7 showed a higher number of detections in outdoor dynamic conditions, achieving a counting accuracy of 86.6%. We concluded that the EfficientDet deep learning algorithm at a 15° orientation in 3D coordinates can be employed for further robotic arm development while harvesting apples in a specially designed orchard.

Keywords: Deep SORT; YOLO; dynamic accuracy; fruit detection; localization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure A1**
Difference between the real distance and the measured distance at three different speeds and three different angles. (a–c): YOLOv7 Deep SORT; (d–f): YOLOv5 Deep SORT; Difference between the real distance and the measured distance in three different speeds and three different angles. (g–i): YOLOv4 Deep SORT; (j–l): EfficientDet Deep SORT; D1, D2, D3 are difference between the measured depths value and the 3D-camera-detection depths value at speed of 0.052 ms⁻¹, 0.069 ms⁻¹ and 0.098 ms⁻¹ respectively.

**Figure 1**
(a) Conventional tree structure with complex distribution of branches and (b) real conventional apple orchard conditions (Aomori Prefectural Industrial Research Center, Research institute in Kuroishi, Aomori Prefecture, Japan).

**Figure 2**
(a) V-shaped tree architecture, (b) tall spindle tree architecture, and (c) V-shape and spindle architecture in recent orchard practices considering automated system.

**Figure 3**
Dataset preparation and augmentation: (a) original RGB image, (b) cropped image from original image, (c) image rotated 180°, (d) image rotated 90° clockwise, (e) image rotated 90° counterclockwise, and (f) all images when converted to grayscale.

**Figure 4**
Data augmentation process and training nets for dataset preparation for training and counting using YOLOv4, YOLOv5, YOLOv7, EfficientDet, and Deep SORT.

**Figure 5**
Apple detection based on bounding boxes for the YOLO-based CNN structure.

**Figure 6**
Camera and image coordinates for individual apple localization.

**Figure 7**
Pixel coordinates vs. image coordinates for individual apple localization.

**Figure 8**
Outdoor data collection procedure at different forward speeds and orientation angles for the recognition of apples and distance information using a 3D camera.

**Figure 9**
(a) A 3D camera at 90° orientation, (b) RGB stream corresponding to the 90° orientation, and (c) a depth map.

**Figure 10**
(a) A 3D camera at 15° orientation, (b) RGB stream corresponding to the 15° orientation, and (c) a depth stream.

**Figure 11**
(a) A 3D camera at 30° orientation, (b) RGB stream corresponding to the 30° camera orientation angle, and (c) a depth stream.

**Figure 12**
Outdoor experiments using a four-wheel tractor and artificial trees to create a specially designed orchard.

**Figure 13**
Architecture of the Deep SORT algorithm for tracking detected apples.

**Figure 14**
Apple coordinates and counting with the YOLOv5 and Deep SORT algorithms using a vertical ROI line.

**Figure 15**
Training performance of datasets using the YOLOv4 algorithm.

**Figure 16**
Training performance of datasets using the YOLOv5 algorithm.

**Figure 17**
Training performance of datasets using the YOLOv7 algorithm.

**Figure 18**
Training performance of datasets using EfficientDet (a) precision curve (mAP@0.5) and (b) total loss curve.

**Figure 19**
Summary of RMSE values for different orientation angles and deep learning algorithms.

**Figure 20**
Number of apples detected out of 30 for different orientation angles and deep learning algorithms.

See this image and copyright information in PMC

References

1. Wang X., Kang H., Zhou H., Au W., Chen C. Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards. Comput. Electron. Agric. 2022;193:106716. doi: 10.1016/j.compag.2022.106716. - DOI
1. Gongal A., Amatya S., Karkee M., Zhang Q., Lewis K. Sensors and systems for fruit detection and localization: A review. Comput. Electron. Agric. 2015;116:8–119. doi: 10.1016/j.compag.2015.05.021. - DOI
1. Andriyanov N., Khasanshin I., Utkin D., Gataullin T., Ignar S., Shumaev V., Soloviev V. Intelligent System for Estimation of the Spatial Position of apples based on YOLOv3 and real sense depth camera D415. Symmetry. 2022;14:148. doi: 10.3390/sym14010148. - DOI
1. Yoshida T., Kawahara T., Fukao T. Fruit recognition method for a harvesting robot with RGB-D cameras. Robomech J. 2022;9:15. doi: 10.1186/s40648-022-00230-y. - DOI
1. Bargoti S., Underwood J.P. Image segmentation for fruit detection and yield estimation in apple orchards. J. Field Robot. 2017;34:1039–1060. doi: 10.1002/rob.21699. - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

Affiliations

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources