Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 8;21(24):8208.
doi: 10.3390/s21248208.

Comparative Study of Markerless Vision-Based Gait Analyses for Person Re-Identification

Affiliations

Comparative Study of Markerless Vision-Based Gait Analyses for Person Re-Identification

Jaerock Kwon et al. Sensors (Basel). .

Abstract

The model-based gait analysis of kinematic characteristics of the human body has been used to identify individuals. To extract gait features, spatiotemporal changes of anatomical landmarks of the human body in 3D were preferable. Without special lab settings, 2D images were easily acquired by monocular video cameras in real-world settings. The 2D and 3D locations of key joint positions were estimated by the 2D and 3D pose estimators. Then, the 3D joint positions can be estimated from the 2D image sequences in human gait. Yet, it has been challenging to have the exact gait features of a person due to viewpoint variance and occlusion of body parts in the 2D images. In the study, we conducted a comparative study of two different approaches: feature-based and spatiotemporal-based viewpoint invariant person re-identification using gait patterns. The first method is to use gait features extracted from time-series 3D joint positions to identify an individual. The second method uses a neural network, a Siamese Long Short Term Memory (LSTM) network with the 3D spatiotemporal changes of key joint positions in a gait cycle to classify an individual without extracting gait features. To validate and compare these two methods, we conducted experiments with two open datasets of the MARS and CASIA-A datasets. The results show that the Siamese LSTM outperforms the gait feature-based approaches on the MARS dataset by 20% and 55% on the CASIA-A dataset. The results show that feature-based gait analysis using 2D and 3D pose estimators is premature. As a future study, we suggest developing large-scale human gait datasets and designing accurate 2D and 3D joint position estimators specifically for gait patterns. We expect that the current comparative study and the future work could contribute to rehabilitation study, forensic gait analysis and early detection of neurological disorders.

Keywords: gait; gait analysis; machine learning; markerless; motion capture; person re-identification; siamese neural networks; vision-based.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
System Overview. The proposed system has two sub-sections: 3D pose estimator from 2D video input (top), gait feature extractor and classifier, and time-series-based Recurrent Neural Network (RNN) classifier (bottom). (a) 2D input video, (b) 2D pose estimator, (c) extracted 2D joint points with skeletal data of human pose, (d) 3D pose estimator, (e) 3D joint points, (f) processed 3D joint points, (g) gait feature extractor, (h) gait feature sets, (i) classifier based on gait features, (j) classified individual, (k) RNN trainer using spatiotemporal joint data, (l) trained RN model, (m) classifier using the RNN model, and (n) identified individual.
Figure 2
Figure 2
Pedestrian examples from Viewpoint Invariant Pedestrian Recognition (VIPeR) dataset. Adapted from [22]. Each column shows the same person but colors look different in other lighting conditions as in the second and last columns. The appearance-based re-identification only works with assumptions where a person shown in a camera appears in another camera in a short period of time with similar lighting conditions.
Figure 3
Figure 3
Skeletal model difference. (a) COCO dataset parts index numbers. (b) H3.6M dataset parts index numbers.
Figure 4
Figure 4
The coordinate system difference between (a) OpenPose. The joint positions are labeled in 2D locations. (b) SYEB. The joint positions are in 3D. The depth information is inferred by the embedded 3D joint position estimator.
Figure 5
Figure 5
Human gait cycle. (a) Initial contact, (b) Loading response, (c) Mid-stand, (d) Terminal stance, (e) Pre-swing, (f) Initial swing, (g) Mid-swing, (h) Terminal swing. The stance phase is defined from (a) to (e). The swing phase is from (f) to (h). The initial contact and the terminal swing is the same event with a different name.
Figure 6
Figure 6
Gait cycle extraction in a sample of dt(a). One gait cycle (green solid line arrows) is from the changes in the distance of two ankles. To remove a subtle change in these ankle distances, data smoothing is applied. The line in blue is raw data. The line in red is after smoothing. The x-axis is for the image frame numbers. The y-axis is for the 3D distances between the two joints.
Figure 7
Figure 7
An example of data smoothing using discrete linear convolution.
Figure 8
Figure 8
Angle features. (a) Hip extension angle, (b) Knee flexion angle, (c) Leg inclination angle, (d) Lateral shoulder drop angle, (e) Trunk side bending angle, (f) Lateral pelvic drop angle, (g) Rearfoot eversion angle.
Figure 9
Figure 9
Siamese Neural Network with two identical subnetworks. (a) Input (b) Convolutional Neural Network layers (c) Fully connected—sigmoid layer (d) Absolute difference (e) Fully connected—sigmoid layer.
Figure 10
Figure 10
Siamese recurrent architectures for learning sentence similarity. Adapted from [47].
Figure 11
Figure 11
A sample tracklet of MARS. Adapted from [48]. Each row is labeled as the same identity.
Figure 12
Figure 12
An example of CASIA Dataset A [4]. The images show three different angles of each person’s data. (a) Parallel to the image plane, (b) 90 degrees, (c) 45 degrees.
Figure 13
Figure 13
Gait example from CASIA-A dataset. (a) Initial contact, (b) Loading response, (c) Mid-stand, (d) Terminal stance, (e) Pre-swing, (f) Initial swing, (g) Mid-swing, (h) Terminal swing. The stance phase is defined from (a) to (e). The swing phase is from (f) to (h).
Figure 14
Figure 14
An example of 2D and 3D key joint estimation. (a) Input image. (b) The 2D estimation result from the input image. (c) The 3D estimation from the 2D key joints.
Figure 15
Figure 15
Examples of the 3D estimation. (a) MARS datasets. (b) CASIA-A datasets. (c) A sample from the authors.
Figure 16
Figure 16
The training graphs for the Siamese-LSTM network with CASIA-A. The graph shows the training was successful with around 95% accuracy and 0.04 loss after 10 epochs.
Figure 17
Figure 17
The performance of classifiers accuracy for the MARS and CASIA-A datasets.
Figure 18
Figure 18
t-SNE feature maps of MARS. A different color and shape marker indicates a feature.
Figure 19
Figure 19
t-SNE feature maps of CASIA-A. A different color and shape marker indicates a feature.
Figure 20
Figure 20
A 90° example from CASIA-A. The data name is lsl-90_1-006.png. The image size is 352×240 and the human subject size is 30×78.
Figure 21
Figure 21
The test with scale-up images that shows a better 3D pose estimation. (a) The original image size 352×240. The 3D estimation result shows some deformation (b) The scaled image to 704×480. No more severe deformation is found. Yet, it is not enough to extract gait features (c) The scaled image to 1126×768. The human subject size is roughly 100×250 which is big enough as an input to 2D and 3D pose estimator.

References

    1. Nixon M.S., Bouchrika I., Arbab-Zavar B., Carter J.N. On use of biometrics in forensics: Gait and ear; Proceedings of the 2010 18th European Signal Processing Conference; Aalborg, Denmark. 23–27 August 2010; pp. 1655–1659.
    1. Liu Z., Zhang Z., Wu Q., Wang Y. Enhancing person re-identification by integrating gait biometric. Neurocomputing. 2015;168:1144–1156. doi: 10.1016/j.neucom.2015.05.008. - DOI
    1. Cuntoor K.R., Kale A., Rajagopalan A.N., Cuntoor N., Krüger V. Gait-based Recognition of Humans Using Continuous HMMs; Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition; Washinton, DC, USA. 21 May 2002; pp. 321–326.
    1. Wang L., Tan T., Ning H., Hu W. Silhouette Analysis-Based Gait Recognition for Human Identification. IEEE Trans. Pattern Anal. Mach. Intell. 2003;25:1505–1518. doi: 10.1109/TPAMI.2003.1251144. - DOI
    1. Larsen P.K., Simonsen E.B., Lynnerup N. Gait Analysis in Forensic Medicine. J. Forensic Sci. 2008;53:1149–1153. doi: 10.1111/j.1556-4029.2008.00807.x. - DOI - PubMed

LinkOut - more resources