Inter-modality feature prediction through multimodal fusion for 3D shape defect detection

Mujtaba Asad¹, Waqar Azeem², Hafiz Tayyab Mustafa³, Yuming Fang⁴, Jie Yang⁵, Yifan Zuo⁶, Wei Liu⁷

Affiliations

¹ School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: asadmujtaba@sjtu.edu.cn.
² Department of Software Engineering, Lahore Garrison University, Lahore, 54810, Pakistan. Electronic address: waqar.azeem@lgu.edu.pk.
³ School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China. Electronic address: mustafa.tayyab@zjnu.edu.cn.
⁴ School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics, Nanchang, 330032, Jiangxi, China. Electronic address: fa0001ng@e.ntu.edu.sg.
⁵ School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: jieyang@sjtu.edu.cn.
⁶ School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics, Nanchang, 330032, Jiangxi, China. Electronic address: kenny0410@126.com.
⁷ School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: weiliucv@sjtu.edu.cn.

PMID: 40921124
DOI: 10.1016/j.neunet.2025.108057

Inter-modality feature prediction through multimodal fusion for 3D shape defect detection

Mujtaba Asad et al. Neural Netw. 2025.

. 2025 Sep 7:193:108057.

doi: 10.1016/j.neunet.2025.108057. Online ahead of print.

Authors

Mujtaba Asad¹, Waqar Azeem², Hafiz Tayyab Mustafa³, Yuming Fang⁴, Jie Yang⁵, Yifan Zuo⁶, Wei Liu⁷

Affiliations

¹ School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: asadmujtaba@sjtu.edu.cn.
² Department of Software Engineering, Lahore Garrison University, Lahore, 54810, Pakistan. Electronic address: waqar.azeem@lgu.edu.pk.
³ School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China. Electronic address: mustafa.tayyab@zjnu.edu.cn.
⁴ School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics, Nanchang, 330032, Jiangxi, China. Electronic address: fa0001ng@e.ntu.edu.sg.
⁵ School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: jieyang@sjtu.edu.cn.
⁶ School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics, Nanchang, 330032, Jiangxi, China. Electronic address: kenny0410@126.com.
⁷ School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: weiliucv@sjtu.edu.cn.

PMID: 40921124
DOI: 10.1016/j.neunet.2025.108057

Abstract

3D shape defect detection plays an important role in autonomous industrial inspection. However, accurate detection of anomalies remains challenging due to the complexity of multimodal sensor data, especially when both color and structural information are required. In this work, we propose a lightweight inter-modality feature prediction framework that effectively utilizes multimodal fused features from the inputs of RGB, depth and point clouds for efficient 3D shape defect detection. Our proposed framework consists of three main key components: 1) Modality-specific pre-trained feature extractor networks, 2) Multi-level Adaptive Dual-Modal Gated Fusion (ADMGF) module that effectively combines the RGB and depth features to obtain rich spatial and contextual information. 3) A lightweight inter-modal feature prediction network that utilizes the fused RGB-Depth features to predict the corresponding point cloud features and vice versa, forming a bidirectional learning mechanism through tri-modal inputs. Our model eliminates the need for large memory banks or pixel-level reconstructions. Comprehensive experiments on the MVTec3D-AD and Eyecandies datasets showed significant improvements in performance over the state-of-the-art methods.

Keywords: Anomaly detection; Cross-attention; Industrial automation,; Inter-modality representation learning; Multi-level feature fusion.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

LinkOut - more resources

Full Text Sources
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inter-modality feature prediction through multimodal fusion for 3D shape defect detection

Affiliations

Inter-modality feature prediction through multimodal fusion for 3D shape defect detection

Authors

Affiliations

Abstract

Conflict of interest statement

LinkOut - more resources

Full Text Sources