Inter-modality feature prediction through multimodal fusion for 3D shape defect detection
- PMID: 40921124
- DOI: 10.1016/j.neunet.2025.108057
Inter-modality feature prediction through multimodal fusion for 3D shape defect detection
Abstract
3D shape defect detection plays an important role in autonomous industrial inspection. However, accurate detection of anomalies remains challenging due to the complexity of multimodal sensor data, especially when both color and structural information are required. In this work, we propose a lightweight inter-modality feature prediction framework that effectively utilizes multimodal fused features from the inputs of RGB, depth and point clouds for efficient 3D shape defect detection. Our proposed framework consists of three main key components: 1) Modality-specific pre-trained feature extractor networks, 2) Multi-level Adaptive Dual-Modal Gated Fusion (ADMGF) module that effectively combines the RGB and depth features to obtain rich spatial and contextual information. 3) A lightweight inter-modal feature prediction network that utilizes the fused RGB-Depth features to predict the corresponding point cloud features and vice versa, forming a bidirectional learning mechanism through tri-modal inputs. Our model eliminates the need for large memory banks or pixel-level reconstructions. Comprehensive experiments on the MVTec3D-AD and Eyecandies datasets showed significant improvements in performance over the state-of-the-art methods.
Keywords: Anomaly detection; Cross-attention; Industrial automation,; Inter-modality representation learning; Multi-level feature fusion.
Copyright © 2025 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
LinkOut - more resources
Full Text Sources