Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 7;23(8):3810.
doi: 10.3390/s23083810.

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

Affiliations

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems

R M Rasika D Abeyrathna et al. Sensors (Basel). .

Abstract

Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to develop a recognition system based on training datasets from an augmented, complex apple orchard. The recognition system was evaluated using deep learning algorithms established from a convolutional neural network (CNN). The dynamic accuracy of the modern artificial neural networks involving 3D coordinates for deploying robotic arms at different forward-moving speeds from an experimental vehicle was investigated to compare the recognition and tracking localization accuracy. In this study, a Realsense D455 RGB-D camera was selected to acquire 3D coordinates of each detected and counted apple attached to artificial trees placed in the field to propose a specially designed structure for ease of robotic harvesting. A 3D camera, YOLO (You Only Look Once), YOLOv4, YOLOv5, YOLOv7, and EfficienDet state-of-the-art models were utilized for object detection. The Deep SORT algorithm was employed for tracking and counting detected apples using perpendicular, 15°, and 30° orientations. The 3D coordinates were obtained for each tracked apple when the on-board camera in the vehicle passed the reference line and was set in the middle of the image frame. To optimize harvesting at three different speeds (0.052 ms-1, 0.069 ms-1, and 0.098 ms-1), the accuracy of 3D coordinates was compared for three forward-moving speeds and three camera angles (15°, 30°, and 90°). The mean average precision (mAP@0.5) values of YOLOv4, YOLOv5, YOLOv7, and EfficientDet were 0.84, 0.86, 0.905, and 0.775, respectively. The lowest root mean square error (RMSE) was 1.54 cm for the apples detected by EfficientDet at a 15° orientation and a speed of 0.098 ms-1. In terms of counting apples, YOLOv5 and YOLOv7 showed a higher number of detections in outdoor dynamic conditions, achieving a counting accuracy of 86.6%. We concluded that the EfficientDet deep learning algorithm at a 15° orientation in 3D coordinates can be employed for further robotic arm development while harvesting apples in a specially designed orchard.

Keywords: Deep SORT; YOLO; dynamic accuracy; fruit detection; localization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure A1
Figure A1
Difference between the real distance and the measured distance at three different speeds and three different angles. (ac): YOLOv7 Deep SORT; (df): YOLOv5 Deep SORT; Difference between the real distance and the measured distance in three different speeds and three different angles. (gi): YOLOv4 Deep SORT; (jl): EfficientDet Deep SORT; D1, D2, D3 are difference between the measured depths value and the 3D-camera-detection depths value at speed of 0.052 ms−1, 0.069 ms−1 and 0.098 ms−1 respectively.
Figure A1
Figure A1
Difference between the real distance and the measured distance at three different speeds and three different angles. (ac): YOLOv7 Deep SORT; (df): YOLOv5 Deep SORT; Difference between the real distance and the measured distance in three different speeds and three different angles. (gi): YOLOv4 Deep SORT; (jl): EfficientDet Deep SORT; D1, D2, D3 are difference between the measured depths value and the 3D-camera-detection depths value at speed of 0.052 ms−1, 0.069 ms−1 and 0.098 ms−1 respectively.
Figure 1
Figure 1
(a) Conventional tree structure with complex distribution of branches and (b) real conventional apple orchard conditions (Aomori Prefectural Industrial Research Center, Research institute in Kuroishi, Aomori Prefecture, Japan).
Figure 2
Figure 2
(a) V-shaped tree architecture, (b) tall spindle tree architecture, and (c) V-shape and spindle architecture in recent orchard practices considering automated system.
Figure 3
Figure 3
Dataset preparation and augmentation: (a) original RGB image, (b) cropped image from original image, (c) image rotated 180°, (d) image rotated 90° clockwise, (e) image rotated 90° counterclockwise, and (f) all images when converted to grayscale.
Figure 4
Figure 4
Data augmentation process and training nets for dataset preparation for training and counting using YOLOv4, YOLOv5, YOLOv7, EfficientDet, and Deep SORT.
Figure 5
Figure 5
Apple detection based on bounding boxes for the YOLO-based CNN structure.
Figure 6
Figure 6
Camera and image coordinates for individual apple localization.
Figure 7
Figure 7
Pixel coordinates vs. image coordinates for individual apple localization.
Figure 8
Figure 8
Outdoor data collection procedure at different forward speeds and orientation angles for the recognition of apples and distance information using a 3D camera.
Figure 9
Figure 9
(a) A 3D camera at 90° orientation, (b) RGB stream corresponding to the 90° orientation, and (c) a depth map.
Figure 10
Figure 10
(a) A 3D camera at 15° orientation, (b) RGB stream corresponding to the 15° orientation, and (c) a depth stream.
Figure 11
Figure 11
(a) A 3D camera at 30° orientation, (b) RGB stream corresponding to the 30° camera orientation angle, and (c) a depth stream.
Figure 12
Figure 12
Outdoor experiments using a four-wheel tractor and artificial trees to create a specially designed orchard.
Figure 13
Figure 13
Architecture of the Deep SORT algorithm for tracking detected apples.
Figure 14
Figure 14
Apple coordinates and counting with the YOLOv5 and Deep SORT algorithms using a vertical ROI line.
Figure 15
Figure 15
Training performance of datasets using the YOLOv4 algorithm.
Figure 16
Figure 16
Training performance of datasets using the YOLOv5 algorithm.
Figure 17
Figure 17
Training performance of datasets using the YOLOv7 algorithm.
Figure 18
Figure 18
Training performance of datasets using EfficientDet (a) precision curve (mAP@0.5) and (b) total loss curve.
Figure 19
Figure 19
Summary of RMSE values for different orientation angles and deep learning algorithms.
Figure 20
Figure 20
Number of apples detected out of 30 for different orientation angles and deep learning algorithms.

Similar articles

Cited by

References

    1. Wang X., Kang H., Zhou H., Au W., Chen C. Geometry-aware fruit grasping estimation for robotic harvesting in apple orchards. Comput. Electron. Agric. 2022;193:106716. doi: 10.1016/j.compag.2022.106716. - DOI
    1. Gongal A., Amatya S., Karkee M., Zhang Q., Lewis K. Sensors and systems for fruit detection and localization: A review. Comput. Electron. Agric. 2015;116:8–119. doi: 10.1016/j.compag.2015.05.021. - DOI
    1. Andriyanov N., Khasanshin I., Utkin D., Gataullin T., Ignar S., Shumaev V., Soloviev V. Intelligent System for Estimation of the Spatial Position of apples based on YOLOv3 and real sense depth camera D415. Symmetry. 2022;14:148. doi: 10.3390/sym14010148. - DOI
    1. Yoshida T., Kawahara T., Fukao T. Fruit recognition method for a harvesting robot with RGB-D cameras. Robomech J. 2022;9:15. doi: 10.1186/s40648-022-00230-y. - DOI
    1. Bargoti S., Underwood J.P. Image segmentation for fruit detection and yield estimation in apple orchards. J. Field Robot. 2017;34:1039–1060. doi: 10.1002/rob.21699. - DOI

LinkOut - more resources