Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 5:2022:4398727.
doi: 10.1155/2022/4398727. eCollection 2022.

SR-DSFF and FENet-ReID: A Two-Stage Approach for Cross Resolution Person Re-Identification

Affiliations

SR-DSFF and FENet-ReID: A Two-Stage Approach for Cross Resolution Person Re-Identification

Zongzong Wu et al. Comput Intell Neurosci. .

Abstract

In real-life scenarios, the accuracy of person re-identification (Re-ID) is subject to the limitation of camera hardware conditions and the change of image resolution caused by factors such as camera focusing errors. People call this problem cross-resolution person Re-ID. In this paper, we improve the recognition accuracy of cross-resolution person Re-ID by enhancing the image enhancement network and feature extraction network. Specifically, we treat cross-resolution person Re-ID as a two-stage task: the first stage is the image enhancement stage, and we propose a Super-Resolution Dual-Stream Feature Fusion sub-network, named SR-DSFF, which contains SR module and DSFF module. The SR-DSFF utilizes the SR module recovers the resolution of the low-resolution (LR) images and then obtains the feature maps of the LR images and super-resolution (SR) images, respectively, through the dual-stream feature fusion with learned weights extracts and fuses feature maps from LR and SR images in the DSFF module. At the end of SR-DSFF, we set a transposed convolution to visualize the feature maps into images. The second stage is the feature acquisition stage. We design a global-local feature extraction network guided by human pose estimation, named FENet-ReID. The FENet-ReID obtains the final features through multistage feature extraction and multiscale feature fusion for the Re-ID task. The two stages complement each other, making the final pedestrian feature representation has the advantage of accurate identification compared with other methods. Experimental results show that our method improves significantly compared with some state-of-the-art methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interests.

Figures

Figure 1
Figure 1
The network consists of SR-DSFF sub-network and FENet-ReID. The query images first enter the SR-DSFF, and the SR images are output through the feature extractor and the upscale module in the SR module. Then, the feature maps of the query images and the SR images are jointly learned and fused through the DSFF module, and the final images are output into FENet-ReID through transposed convolution. FENet-ReID extracts the global and local features of the images that are obtained in the SR-DSFF and fuses them to obtain the final feature maps. Finally, a fully connected (FC) layer is used on the final feature maps to predict the ID labels of pedestrians. Our network is divided into two training stages: (1) Update the SR module with the SR loss ℒrec (equation (6)); and (2) jointly train the DSFF and the FENet-ReID with the total loss ℒTOTAL (equation (12)). These two stages are represented by yellow and black arrows on the figure, respectively.
Figure 2
Figure 2
Shows the performance of our SR module on the dataset Market1501. The effect is evident by comparing it with LR images.
Figure 3
Figure 3
We add spatial attention and channel attention to the last ResNet101 Block. The lower right corner of the figure takes the branch FESL as an example to give a detailed attention diagram, which is in FESS has the same structure.
Figure 4
Figure 4
Flowchart of FENet-ReID. The full image and three human key regions images are extracted by FE-C1 and FE-C2, respectively, and the obtained 256-dimensional features are fused by three fusion units in FFM.

References

    1. Wang Y., Wang L., You Y., et al. Resource Aware Person Re-identification across Multiple Resolutions. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; June 2018; Salt Lake City, UT, USA. pp. 8042–8051. - DOI
    1. Huang Y., Zha Z.-J., Fu X., Zhang W. Illumination-invariant Person Re-identification. Proceedings of the 27th ACM International Conference on Multimedia (MM’19); October 2019; Nice, France. pp. 365–373. - DOI
    1. Hou R., Ma B., Chang H., Gu X., Shan S., Chen X. Vrstc: Occlusion-free Video Person Re-identification. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); June 2019; Long Beach, CA, USA. pp. 7183–7192. - DOI
    1. Pang J., Zhang D., Li H., Liu W., Yu Z. Hazy Re-ID: An Interference Suppression Model for Domain Adaptation Person Re-identification under Inclement Weather Condition. Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME); July 2021; Shenzhen, China. - DOI
    1. Zheng L., Shen L., Lu T., Wang S., Wang J., Tian Q. Scalable Person Re-identification: A Benchmark. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); December 2015; Santiago, Chile. IEEE; - DOI