Improved object detection method for unmanned driving based on Transformers
- PMID: 38752022
- PMCID: PMC11094364
- DOI: 10.3389/fnbot.2024.1342126
Improved object detection method for unmanned driving based on Transformers
Abstract
The object detection method serves as the core technology within the unmanned driving perception module, extensively employed for detecting vehicles, pedestrians, traffic signs, and various objects. However, existing object detection methods still encounter three challenges in intricate unmanned driving scenarios: unsatisfactory performance in multi-scale object detection, inadequate accuracy in detecting small objects, and occurrences of false positives and missed detections in densely occluded environments. Therefore, this study proposes an improved object detection method for unmanned driving, leveraging Transformer architecture to address these challenges. First, a multi-scale Transformer feature extraction method integrated with channel attention is used to enhance the network's capability in extracting features across different scales. Second, a training method incorporating Query Denoising with Gaussian decay was employed to enhance the network's proficiency in learning representations of small objects. Third, a hybrid matching method combining Optimal Transport and Hungarian algorithms was used to facilitate the matching process between predicted and actual values, thereby enriching the network with more informative positive sample features. Experimental evaluations conducted on datasets including KITTI demonstrate that the proposed method achieves 3% higher mean Average Precision (mAP) than that of the existing methodologies.
Keywords: Transformer; feature extraction; object detection; optimal transport; query denoising.
Copyright © 2024 Zhao, Peng, Wang, Li, Pan, Su and Liu.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures

















Similar articles
-
RE-YOLOv5: Enhancing Occluded Road Object Detection via Visual Receptive Field Improvements.Sensors (Basel). 2025 Apr 17;25(8):2518. doi: 10.3390/s25082518. Sensors (Basel). 2025. PMID: 40285209 Free PMC article.
-
SRE-YOLOv8: An Improved UAV Object Detection Model Utilizing Swin Transformer and RE-FPN.Sensors (Basel). 2024 Jun 17;24(12):3918. doi: 10.3390/s24123918. Sensors (Basel). 2024. PMID: 38931702 Free PMC article.
-
Multi-Task Environmental Perception Methods for Autonomous Driving.Sensors (Basel). 2024 Aug 28;24(17):5552. doi: 10.3390/s24175552. Sensors (Basel). 2024. PMID: 39275463 Free PMC article.
-
S2*-ODM: Dual-Stage Improved PointPillar Feature-Based 3D Object Detection Method for Autonomous Driving.Sensors (Basel). 2025 Mar 4;25(5):1581. doi: 10.3390/s25051581. Sensors (Basel). 2025. PMID: 40096446 Free PMC article.
-
Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model.Sensors (Basel). 2024 Aug 24;24(17):5496. doi: 10.3390/s24175496. Sensors (Basel). 2024. PMID: 39275406 Free PMC article.
Cited by
-
Context-Aware Enhanced Feature Refinement for small object detection with Deformable DETR.Front Neurorobot. 2025 Jun 10;19:1588565. doi: 10.3389/fnbot.2025.1588565. eCollection 2025. Front Neurorobot. 2025. PMID: 40557308 Free PMC article.
References
-
- Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S., et al. . (2020). “End-to-end object detection with transformers,” in European conference on computer vision (Cham: Springer; ), 213–229. 10.1007/978-3-030-58452-8_13 - DOI
-
- Cortes C., Vapnik V. (1995). Support-vector networks. Mach. Learn. 20, 273–297. 10.1007/BF00994018 - DOI
-
- Dalal N., Triggs B. (2005). “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), Vol. 1 (San Diego, CA: IEEE; ), 886–893. 10.1109/CVPR.2005.177 - DOI
-
- Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., et al. . (2020). An image is worth 16x16 words: transformers for image recognition at scale. arXiv 1–22 [Preprint]. arXiv:2010.11929. 10.48550/arXiv.2010.11929 - DOI
LinkOut - more resources
Full Text Sources