Improved object detection method for unmanned driving based on Transformers
- PMID: 38752022
- PMCID: PMC11094364
- DOI: 10.3389/fnbot.2024.1342126
Improved object detection method for unmanned driving based on Transformers
Abstract
The object detection method serves as the core technology within the unmanned driving perception module, extensively employed for detecting vehicles, pedestrians, traffic signs, and various objects. However, existing object detection methods still encounter three challenges in intricate unmanned driving scenarios: unsatisfactory performance in multi-scale object detection, inadequate accuracy in detecting small objects, and occurrences of false positives and missed detections in densely occluded environments. Therefore, this study proposes an improved object detection method for unmanned driving, leveraging Transformer architecture to address these challenges. First, a multi-scale Transformer feature extraction method integrated with channel attention is used to enhance the network's capability in extracting features across different scales. Second, a training method incorporating Query Denoising with Gaussian decay was employed to enhance the network's proficiency in learning representations of small objects. Third, a hybrid matching method combining Optimal Transport and Hungarian algorithms was used to facilitate the matching process between predicted and actual values, thereby enriching the network with more informative positive sample features. Experimental evaluations conducted on datasets including KITTI demonstrate that the proposed method achieves 3% higher mean Average Precision (mAP) than that of the existing methodologies.
Keywords: Transformer; feature extraction; object detection; optimal transport; query denoising.
Copyright © 2024 Zhao, Peng, Wang, Li, Pan, Su and Liu.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures

















References
-
- Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S., et al. . (2020). “End-to-end object detection with transformers,” in European conference on computer vision (Cham: Springer; ), 213–229. 10.1007/978-3-030-58452-8_13 - DOI
-
- Cortes C., Vapnik V. (1995). Support-vector networks. Mach. Learn. 20, 273–297. 10.1007/BF00994018 - DOI
-
- Dalal N., Triggs B. (2005). “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 1 (San Diego, CA: IEEE; ), 886–893. 10.1109/CVPR.2005.177 - DOI
-
- Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., et al. . (2020). An image is worth 16x16 words: transformers for image recognition at scale. arXiv 1–22 [Preprint]. arXiv:2010.11929. 10.48550/arXiv.2010.11929 - DOI
LinkOut - more resources
Full Text Sources