. 2022 Nov 9;22(22):8651.

doi: 10.3390/s22228651.

V²ReID: Vision-Outlooker-Based Vehicle Re-Identification

Yan Qian¹, Johan Barthelemy², Umair Iqbal¹, Pascal Perez¹

Affiliations

PMID: 36433251
PMCID: PMC9692519
DOI: 10.3390/s22228651

V²ReID: Vision-Outlooker-Based Vehicle Re-Identification

Yan Qian et al. Sensors (Basel). 2022.

. 2022 Nov 9;22(22):8651.

doi: 10.3390/s22228651.

Authors

Yan Qian¹, Johan Barthelemy², Umair Iqbal¹, Pascal Perez¹

Affiliations

¹ SMART Infrastructure Facility, University of Wollongong, Wollongong 2500, Australia.
² NVIDIA, Santa Clara, CA 95051, USA.

PMID: 36433251
PMCID: PMC9692519
DOI: 10.3390/s22228651

Abstract

With the increase of large camera networks around us, it is becoming more difficult to manually identify vehicles. Computer vision enables us to automate this task. More specifically, vehicle re-identification (ReID) aims to identify cars in a camera network with non-overlapping views. Images captured of vehicles can undergo intense variations of appearance due to illumination, pose, or viewpoint. Furthermore, due to small inter-class similarities and large intra-class differences, feature learning is often enhanced with non-visual cues, such as the topology of camera networks and temporal information. These are, however, not always available or can be resource intensive for the model. Following the success of Transformer baselines in ReID, we propose for the first time an outlook-attention-based vehicle ReID framework using the Vision Outlooker as its backbone, which is able to encode finer-level features. We show that, without embedding any additional side information and using only the visual cues, we can achieve an 80.31% mAP and 97.13% R-1 on the VeRi-776 dataset. Besides documenting our research, this paper also aims to provide a comprehensive walkthrough of vehicle ReID. We aim to provide a starting point for individuals and organisations, as it is difficult to navigate through the myriad of complex research in this field.

Keywords: Vision Outlooker; explainable AI; secure AI; smart cities; vehicle re-identification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
(**Top row**) Large intra-class differences, i.e., same vehicle looking different from distinct perspectives; (**bottom row**) small inter-class similarities, i.e., different vehicles looking very similar.

**Figure 2**
Categories of vehicle ReID methods. Dashed boxes represent methods that are not detailed.

**Figure 3**
(**Left**) Scaled dot product attention; (**right**) multi-head attention [39].

**Figure 4**
ViT overview [40]: (**Left**) an image is split into patches, each patch is linearly embedded and fed into the Transformer Encoder; (**right**) the building blocks of the Transformer Encoder.

**Figure 5**
Non-overlapping patches (**left**) vs. overlapping patches (**right**).

**Figure 6**
Detailed illustration of the outlook attention from [10].

**Figure 7**
The pipeline of the standard baseline (**right**) and the proposed BNNeck [98].

**Figure 8**
Illustration of V²ReID using Vision Outlooker as the backbone. The numbers denote each step: from splitting the input image into fixed-size patches, to feeding the patches in VOLO, to classifying the input image.

**Figure 9**
Example of rank lists of queries and the returned ranked gallery sample. Green means a correct match, and red means the wrong matching. For all rank lists, the CMCs are 1 while the AP are 1, 1 and 0.7.

**Figure 10**
Samples of data augmentation methods: input (**left**), resizing (a), horizontal flipping (b), padding (c), random cropping and resizing (d), normalizing (e), and random erasing (f).

**Figure 11**
The mAP score (%) and training loss per epoch using different loss functions and learning rates: $L_{I D}$ and LR = $1.0 \times 10^{- 3}$ (blue), $L_{I D}, L_{t r i}$ and LR = $2.0 \times 10^{- 3}$ (yellow), $L_{I D}, L_{t r i}, L_{c e n}$ and LR = $2.0 \times 10^{- 3}$ (green), $L_{I D}, L_{t r i}, L_{c e n}$ , BNNeck and LR = $2.0 \times 10^{- 3}$ (red), and LNNeck and LR = $1.0 \times 10^{- 3}$ (purple).

**Figure 12**
The mAP and R-1 scores in % for different learning rate values using $L_{i d}, L_{t r i}, L_{c e n}$ and the BNNeck.

**Figure 13**
The mAP scores and training loss per epoch for different variants using BNNeck and a base learning rate of 0.0150. The bottom figure shows how the learning rate decays per epoch using the cosine annealing.

**Figure 14**
The mAP in % per epoch when training VOLO-D3 using the three losses, BNNeck with different learning rates: 0.015000 (orange), 0.0150001 (blue), 0.015001 (green), and 0.015010 (red).

**Figure 15**
Visualization of the learning rate decay using the cosine annealing decay with a base learning rate of $3.0 \times 10^{- 4}$ , based on (a) the initial number of epochs, (b) the number of maximum restarts, (c) a warm-up of 70 epochs using a pre-fix, and (d) the k-decay rate from [105].

**Figure 16**
The mAP, loss, and learning rate per epoch when training D3 using the three losses and the BNNeck. The learning rate is linearly warmed up with different numbers of epochs until reaching LR_base = 0.0150. The LR is then decayed using cosine annealing for 300 epochs.

**Figure 17**
The mAP, loss, and learning rate per epoch when training D3 using the three losses and the BNNeck. The learning rate is linearly warmed up of 10 epochs until reaching LR_base = 0.0150. The LR is then decayed using cosine annealing for 300 epochs with different restart values (140, 150, and 190) and decay rates (0.1 and 0.8).

**Figure 18**
Visualization of five different predicted matches, shown in order from the top-10 ranking list. Given a query (yellow), the model either retrieves a match (in green) or a non-match (red).

See this image and copyright information in PMC

References

1. Zhang J., Wang F.Y., Wang K., Lin W.H., Xu X., Chen C. Data-driven intelligent transportation systems: A survey. IEEE Trans. Intell. Transp. Syst. 2011;12:1624–1639. doi: 10.1109/TITS.2011.2158001. - DOI
1. Zheng Y., Capra L., Wolfson O., Yang H. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 2014;5:1–55. doi: 10.1145/2629592. - DOI
1. Liu X., Liu W., Ma H., Fu H. Large-scale vehicle re-identification in urban surveillance videos; Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME); Seattle, WA, USA. 11-15 July 2016; pp. 1–6.
1. Liu W., Zhang Y., Tang S., Tang J., Hong R., Li J. Accurate estimation of human body orientation from RGB-D sensors. IEEE Trans. Cybern. 2013;43:1442–1452. doi: 10.1109/TCYB.2013.2272636. - DOI - PubMed
1. Deng J., Hao Y., Khokhar M.S., Kumar R., Cai J., Kumar J., Aftab M.U. Trends in vehicle re-identification past, present, and future: A comprehensive review. Mathematics. 2021;9:3162.

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

V²ReID: Vision-Outlooker-Based Vehicle Re-Identification

Affiliations

V²ReID: Vision-Outlooker-Based Vehicle Re-Identification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources