FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything

Safouane El Ghazouali¹, Youssef Mhirit², Ali Oukhrid³, Umberto Michelucci¹, Hichem Nouira⁴

Affiliations

¹ TOELT LLC, AI Lab, 8406 Winterthur, Switzerland.
² Independent Researcher, 75000 Paris, France.
³ Independent Researcher, 2502 Biel/Bienne, Switzerland.
⁴ LNE Laboratoire National de Metrologie et d'Essaies, 75015 Paris, France.

PMID: 38732995
PMCID: PMC11086350
DOI: 10.3390/s24092889

FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything

Safouane El Ghazouali et al. Sensors (Basel). 2024.

. 2024 Apr 30;24(9):2889.

doi: 10.3390/s24092889.

Authors

Safouane El Ghazouali¹, Youssef Mhirit², Ali Oukhrid³, Umberto Michelucci¹, Hichem Nouira⁴

Affiliations

¹ TOELT LLC, AI Lab, 8406 Winterthur, Switzerland.
² Independent Researcher, 75000 Paris, France.
³ Independent Researcher, 2502 Biel/Bienne, Switzerland.
⁴ LNE Laboratoire National de Metrologie et d'Essaies, 75015 Paris, France.

PMID: 38732995
PMCID: PMC11086350
DOI: 10.3390/s24092889

Abstract

In the realm of computer vision, the integration of advanced techniques into the pre-processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth maps, as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color RGB and depth D channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information in order to improve post-processes such as object 6D pose estimation, Simultanious Localization and Mapping (SLAM) operations, accurate 3D dataset extraction, etc. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation.

Keywords: 3D localization; 3D object detection; 3D reconstruction; RGB-D; SAM; point-cloud.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Example of RGB-D camera scene capturing and 3D reconstruction. (a) 3D reconstruction from RGB-D depth-channel. (b) RGB stream capture from RGB sensor. (c) Visual estimation of depth with the ColorMap JET (the closer object are represented in green and far ones are the dark blue regions).

**Figure 2**
Complex YOLO framework for 3D object reconstruction and localization [47].

**Figure 3**
Proposed FusionVision pipeline for real-time 3D object segmentation and localization using fused YOLO and FastSAM applied on RGB-D sensor.

**Figure 4**
Visual representation of RGB camera alignment with the depth sensor.

**Figure 5**
Example of acquired images for YOLO training: the top two images are original, the bottom ones are augmented images.

**Figure 6**
YOLO training curves: (a) bbox loss, (b) cls loss, (c) precision and recall, and (d) mAP50 and mAP50-95.

**Figure 7**
Visuals of the YOLO detection, FastSAM mask extraction, and binary mask estimation: (a) using the pre-trained YOLO model; (b) using the custom trained YOLO model.

**Figure 8**
Overall evaluation metrics of FastSAM applied on extracted YOLO bounding boxes and compared to ground truth annotation. The blue points refers to the values of the metrics and black segments are standard deviations.

**Figure 9**
Example of FastSAM misestimation of the segmentation mask: (a) original image, (b) ground truth annotation mask, and (c) FastSAM estimated mask.

**Figure 10**
Three-dimensional object reconstruction from aligned FastSAM mask: (a) raw point-cloud and (b) post-processing point-cloud by voxel downsampling and statistical denoiser technique. The left images visualizing the YOLO detection, FastSAM mask extraction, and Binary mask estimation at specific positions of the physical objects within the frame.

**Figure 11**
Post-processing impact on 3D object reconstruction: (a) raw point-clouds, (b) Downsampled point-clouds, and (c) Downsampled + denoised point-clouds.

See this image and copyright information in PMC

References

1. Liu M. Robotic Online Path Planning on Point Cloud. IEEE Trans. Cybern. 2016;46:1217–1228. doi: 10.1109/TCYB.2015.2430526. - DOI - PubMed
1. Ding Z., Sun Y., Xu S., Pan Y., Peng Y., Mao Z. Recent Advances and Perspectives in Deep Learning Techniques for 3D Point Cloud Data Processing. Robotics. 2023;12:100. doi: 10.3390/robotics12040100. - DOI
1. Krawczyk D., Sitnik R. Segmentation of 3D Point Cloud Data Representing Full Human Body Geometry: A Review. Pattern Recognit. 2023;139:109444. doi: 10.1016/j.patcog.2023.109444. - DOI
1. Wu F., Qian Y., Zheng H., Zhang Y., Zheng X. A Novel Neighbor Aggregation Function for Medical Point Cloud Analysis; Proceedings of the Computer Graphics International Conference; Shanghai, China. 28 August–1 September 2023; Berlin/Heidelberg, Germany: Springer; 2023. pp. 301–312.
1. Xie X., Wei H., Yang Y. Real-Time LiDAR Point-Cloud Moving Object Segmentation for Autonomous Driving. Sensors. 2023;23:547. doi: 10.3390/s23010547. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything

Affiliations

FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources