Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 31;22(11):4187.
doi: 10.3390/s22114187.

Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN

Affiliations

Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN

Siyu Pan et al. Sensors (Basel). .

Abstract

In orchard fruit picking systems for pears, the challenge is to identify the full shape of the soft fruit to avoid injuries while using robotic or automatic picking systems. Advancements in computer vision have brought the potential to train for different shapes and sizes of fruit using deep learning algorithms. In this research, a fruit recognition method for robotic systems was developed to identify pears in a complex orchard environment using a 3D stereo camera combined with Mask Region-Convolutional Neural Networks (Mask R-CNN) deep learning technology to obtain targets. This experiment used 9054 RGBA original images (3018 original images and 6036 augmented images) to create a dataset divided into a training, validation, and testing sets. Furthermore, we collected the dataset under different lighting conditions at different times which were high-light (9-10 am) and low-light (6-7 pm) conditions at JST, Tokyo Time, August 2021 (summertime) to prepare training, validation, and test datasets at a ratio of 6:3:1. All the images were taken by a 3D stereo camera which included PERFORMANCE, QUALITY, and ULTRA models. We used the PERFORMANCE model to capture images to make the datasets; the camera on the left generated depth images and the camera on the right generated the original images. In this research, we also compared the performance of different types with the R-CNN model (Mask R-CNN and Faster R-CNN); the mean Average Precisions (mAP) of Mask R-CNN and Faster R-CNN were compared in the same datasets with the same ratio. Each epoch in Mask R-CNN was set at 500 steps with total 80 epochs. And Faster R-CNN was set at 40,000 steps for training. For the recognition of pears, the Mask R-CNN, had the mAPs of 95.22% for validation set and 99.45% was observed for the testing set. On the other hand, mAPs were observed 87.9% in the validation set and 87.52% in the testing set using Faster R-CNN. The different models using the same dataset had differences in performance in gathering clustered pears and individual pear situations. Mask R-CNN outperformed Faster R-CNN when the pears are densely clustered at the complex orchard. Therefore, the 3D stereo camera-based dataset combined with the Mask R-CNN vision algorithm had high accuracy in detecting the individual pears from gathered pears in a complex orchard environment.

Keywords: 3D stereo camera; Mask R-CNN; pear detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Aerial view of orchards for data collection located at the Tsukuba-Plant Innovation Research Center (T-PIRC), University of Tsukuba, Tsukuba, Ibaraki. (a) satellite view of Tsukuba-Plant Innovation Research Center (T-PIRC); (b) the view of pear orchard in T-PIRC.
Figure 2
Figure 2
Different segmentation in pear detection using 3D camera datasets, (a) original image; (b) semantic segmentation; (c) object detection and (d) instance segmentation.
Figure 3
Figure 3
Mask R-CNN structure for pear quantity in orchards from 3D camera datasets. The original images enter the backbone network for selection and screening to get the feature maps. Then, the foreground and background are extracted in the RPN network, enter the ROI-Align network for standardization, and finally enter the head network to generate classes, boxes, and masks for pear detection.
Figure 4
Figure 4
The inner structure of ResNet101 as an example of second layers (C2): (a) is conv block and (b) is identity block. The images which were inputted into the ResNet have changed the channels. Conv block is the first stage of each layer, and the identity blocks and conv blocks were combined to the ResNet.
Figure 5
Figure 5
ResNet101 + FPN for pear quantity recognition.
Figure 6
Figure 6
RPN in Mask R-CNN for extracting proposals of original pear images.
Figure 7
Figure 7
Generation for IoU by comparing anchor boxes with ground truth boxes. If IoU > 0.7, then label = 1 positive; if IoU < 0.3, then label = −1 negative; others, label = 0.
Figure 8
Figure 8
Bilinear interpolation in ROI-Align was used to obtain fixed feature maps for pear recognition. P represents pixel coordinates that ROI-Align wanted to obtain after bilinear interpolation. Q11, Q12, Q22, and Q21 represent the four coordinates of known pixel points around point P.
Figure 9
Figure 9
Flow diagram of feature maps to produce boxes, classes, and masks for each pear in each fixed-size feature map after ROI-Align.
Figure 10
Figure 10
Pear prediction for determining FP, TN, TP, and FN using Mask R-CNN, (a) original image; (b) cv_mask input image before testing and (c) mask image after testing.
Figure 11
Figure 11
Mask R-CNN loss results from training losses and validation losses, (a) Total loss; (b) Mask R-CNN head bounding box loss; (c) Mask R-CNN head class loss; (d) Mask R-CNN mask loss; (e) RPN bounding box loss and (f) RPN class loss.
Figure 11
Figure 11
Mask R-CNN loss results from training losses and validation losses, (a) Total loss; (b) Mask R-CNN head bounding box loss; (c) Mask R-CNN head class loss; (d) Mask R-CNN mask loss; (e) RPN bounding box loss and (f) RPN class loss.
Figure 12
Figure 12
(a) Precision-recall curve of Faster R-CNN at learning rate = 0.001 in the testing set and (b) Precision-recall curve of Mask R-CNN at learning rate = 0.001 in the testing set.
Figure 13
Figure 13
Results of Mask-RCNN in different situations. Recognition of (ac): separated pears in low light; (df): aggregated pears in low light; (gi): separated pears in strong light, and (jl) aggregated pears in strong light. (a,d,g,j) Original image; (b,e,h,k) Testing image in Mask R-CNN; and (c,f,i,l) Testing image in Faster R-CNN.
Figure 13
Figure 13
Results of Mask-RCNN in different situations. Recognition of (ac): separated pears in low light; (df): aggregated pears in low light; (gi): separated pears in strong light, and (jl) aggregated pears in strong light. (a,d,g,j) Original image; (b,e,h,k) Testing image in Mask R-CNN; and (c,f,i,l) Testing image in Faster R-CNN.
Figure 14
Figure 14
Results of Mask-R CNN in rotation angles. Recognition of (ac): separated pear in low light; (df): aggregated pears in low light; (gi): separated pears in strong light, and (jl) aggregated pears in strong light (a,d,g,j) Original image; (b,e,h,k) Testing image in Mask R-CNN; and (c,f,i,l) Testing image in Faster R-CNN.
Figure 14
Figure 14
Results of Mask-R CNN in rotation angles. Recognition of (ac): separated pear in low light; (df): aggregated pears in low light; (gi): separated pears in strong light, and (jl) aggregated pears in strong light (a,d,g,j) Original image; (b,e,h,k) Testing image in Mask R-CNN; and (c,f,i,l) Testing image in Faster R-CNN.

Similar articles

Cited by

References

    1. Barua S. Understanding Coronanomics: The Economic Implications of the Coronavirus (COVID-19) Pandemic. [(accessed on 1 April 2020)]. Available online: https://ssrn.com/abstract=3566477.
    1. Saito T. Advances in Japanese pear breeding in Japan. Breed. Sci. 2016;66:46–59. doi: 10.1270/jsbbs.66.46. - DOI - PMC - PubMed
    1. Schrder C. Employment in European Agriculture: Labour Costs, Flexibility and Contractual Aspects. 2014. [(accessed on 1 April 2020)]. Available online: agricultura.gencat.cat/web/.content/de_departament/de02_estadistiques_ob....
    1. Wei X., Jia K., Lan J., Li Y., Zeng Y., Wang C. Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Optik. 2004;125:5684–5689. doi: 10.1016/j.ijleo.2014.07.001. - DOI
    1. Bechar A., Vigneault C. Agricultural robots for field operations: Concepts and components. Biosyst. Eng. 2016;149:94–111. doi: 10.1016/j.biosystemseng.2016.06.014. - DOI

LinkOut - more resources