Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 9;22(22):8651.
doi: 10.3390/s22228651.

V2ReID: Vision-Outlooker-Based Vehicle Re-Identification

Affiliations

V2ReID: Vision-Outlooker-Based Vehicle Re-Identification

Yan Qian et al. Sensors (Basel). .

Abstract

With the increase of large camera networks around us, it is becoming more difficult to manually identify vehicles. Computer vision enables us to automate this task. More specifically, vehicle re-identification (ReID) aims to identify cars in a camera network with non-overlapping views. Images captured of vehicles can undergo intense variations of appearance due to illumination, pose, or viewpoint. Furthermore, due to small inter-class similarities and large intra-class differences, feature learning is often enhanced with non-visual cues, such as the topology of camera networks and temporal information. These are, however, not always available or can be resource intensive for the model. Following the success of Transformer baselines in ReID, we propose for the first time an outlook-attention-based vehicle ReID framework using the Vision Outlooker as its backbone, which is able to encode finer-level features. We show that, without embedding any additional side information and using only the visual cues, we can achieve an 80.31% mAP and 97.13% R-1 on the VeRi-776 dataset. Besides documenting our research, this paper also aims to provide a comprehensive walkthrough of vehicle ReID. We aim to provide a starting point for individuals and organisations, as it is difficult to navigate through the myriad of complex research in this field.

Keywords: Vision Outlooker; explainable AI; secure AI; smart cities; vehicle re-identification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
(Top row) Large intra-class differences, i.e., same vehicle looking different from distinct perspectives; (bottom row) small inter-class similarities, i.e., different vehicles looking very similar.
Figure 2
Figure 2
Categories of vehicle ReID methods. Dashed boxes represent methods that are not detailed.
Figure 3
Figure 3
(Left) Scaled dot product attention; (right) multi-head attention [39].
Figure 4
Figure 4
ViT overview [40]: (Left) an image is split into patches, each patch is linearly embedded and fed into the Transformer Encoder; (right) the building blocks of the Transformer Encoder.
Figure 5
Figure 5
Non-overlapping patches (left) vs. overlapping patches (right).
Figure 6
Figure 6
Detailed illustration of the outlook attention from [10].
Figure 7
Figure 7
The pipeline of the standard baseline (right) and the proposed BNNeck [98].
Figure 8
Figure 8
Illustration of V2ReID using Vision Outlooker as the backbone. The numbers denote each step: from splitting the input image into fixed-size patches, to feeding the patches in VOLO, to classifying the input image.
Figure 9
Figure 9
Example of rank lists of queries and the returned ranked gallery sample. Green means a correct match, and red means the wrong matching. For all rank lists, the CMCs are 1 while the AP are 1, 1 and 0.7.
Figure 10
Figure 10
Samples of data augmentation methods: input (left), resizing (a), horizontal flipping (b), padding (c), random cropping and resizing (d), normalizing (e), and random erasing (f).
Figure 11
Figure 11
The mAP score (%) and training loss per epoch using different loss functions and learning rates: LID and LR = 1.0×103 (blue), LID,Ltri and LR = 2.0×103 (yellow), LID,Ltri,Lcen and LR = 2.0×103 (green), LID,Ltri,Lcen, BNNeck and LR = 2.0×103 (red), and LNNeck and LR = 1.0×103 (purple).
Figure 12
Figure 12
The mAP and R-1 scores in % for different learning rate values using Lid,Ltri,Lcen and the BNNeck.
Figure 13
Figure 13
The mAP scores and training loss per epoch for different variants using BNNeck and a base learning rate of 0.0150. The bottom figure shows how the learning rate decays per epoch using the cosine annealing.
Figure 14
Figure 14
The mAP in % per epoch when training VOLO-D3 using the three losses, BNNeck with different learning rates: 0.015000 (orange), 0.0150001 (blue), 0.015001 (green), and 0.015010 (red).
Figure 15
Figure 15
Visualization of the learning rate decay using the cosine annealing decay with a base learning rate of 3.0×104, based on (a) the initial number of epochs, (b) the number of maximum restarts, (c) a warm-up of 70 epochs using a pre-fix, and (d) the k-decay rate from [105].
Figure 16
Figure 16
The mAP, loss, and learning rate per epoch when training D3 using the three losses and the BNNeck. The learning rate is linearly warmed up with different numbers of epochs until reaching LRbase = 0.0150. The LR is then decayed using cosine annealing for 300 epochs.
Figure 17
Figure 17
The mAP, loss, and learning rate per epoch when training D3 using the three losses and the BNNeck. The learning rate is linearly warmed up of 10 epochs until reaching LRbase = 0.0150. The LR is then decayed using cosine annealing for 300 epochs with different restart values (140, 150, and 190) and decay rates (0.1 and 0.8).
Figure 18
Figure 18
Visualization of five different predicted matches, shown in order from the top-10 ranking list. Given a query (yellow), the model either retrieves a match (in green) or a non-match (red).

References

    1. Zhang J., Wang F.Y., Wang K., Lin W.H., Xu X., Chen C. Data-driven intelligent transportation systems: A survey. IEEE Trans. Intell. Transp. Syst. 2011;12:1624–1639. doi: 10.1109/TITS.2011.2158001. - DOI
    1. Zheng Y., Capra L., Wolfson O., Yang H. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 2014;5:1–55. doi: 10.1145/2629592. - DOI
    1. Liu X., Liu W., Ma H., Fu H. Large-scale vehicle re-identification in urban surveillance videos; Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME); Seattle, WA, USA. 11-15 July 2016; pp. 1–6.
    1. Liu W., Zhang Y., Tang S., Tang J., Hong R., Li J. Accurate estimation of human body orientation from RGB-D sensors. IEEE Trans. Cybern. 2013;43:1442–1452. doi: 10.1109/TCYB.2013.2272636. - DOI - PubMed
    1. Deng J., Hao Y., Khokhar M.S., Kumar R., Cai J., Kumar J., Aftab M.U. Trends in vehicle re-identification past, present, and future: A comprehensive review. Mathematics. 2021;9:3162.

LinkOut - more resources