Walking fingerprinting

Lily Koffman¹, Ciprian Crainiceanu¹, Andrew Leroux²

Affiliations

¹ Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
² Department of Biostatistics & Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

PMID: 39552748
PMCID: PMC11561731
DOI: 10.1093/jrsssc/qlae033

Walking fingerprinting

Lily Koffman et al. J R Stat Soc Ser C Appl Stat. 2024.

. 2024 Jul 29;73(5):1221-1241.

doi: 10.1093/jrsssc/qlae033. eCollection 2024 Nov.

Authors

Lily Koffman¹, Ciprian Crainiceanu¹, Andrew Leroux²

Affiliations

¹ Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
² Department of Biostatistics & Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

PMID: 39552748
PMCID: PMC11561731
DOI: 10.1093/jrsssc/qlae033

Abstract

We consider the problem of predicting an individual's identity from accelerometry data collected during walking. In a previous paper, we transformed the accelerometry time series into an image by constructing the joint distribution of the acceleration and lagged acceleration for a vector of lags. Predictors derived by partitioning this image into grid cells were used in logistic regression to predict individuals. Here, we (a) implement machine learning methods for prediction using the grid cell-derived predictors; (b) derive inferential methods to screen for the most predictive grid cells while adjusting for correlation and multiple comparisons; and (c) develop a novel multivariate functional regression model that avoids partitioning the predictor space. Prediction methods are compared on two open source acceleometry data sets collected from: (a) 32 individuals walking on a $1.06$ km path; and (b) six repetitions of walking on a 20 m path on two occasions at least 1 week apart for 153 study participants. In the 32-individual study, all methods achieve at least 95% rank-1 accuracy, while in the 153-individual study, accuracy varies from 41% to 98%, depending on the method and prediction task. Methods provide insights into why some individuals are easier to predict than others.

Keywords: accelerometry; biometrics; functional data.

© The Royal Statistical Society 2024. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: Ciprian Crainiceanu is consulting for Bayer and Johnson and Johnson on methods development for wearable and implantable technologies. The details of these contracts are disclosed through the Johns Hopkins University eDisclose system. The research presented here is not related to and was not supported by this consulting work.

Figures

**Figure 1.**
Eight 3-s intervals shown from different study participants. Data are a single time series obtained as the sum of squares of observed accelerations along the three axes. The left panels provide the information about the identity of study participants, whereas the right panels do not. The questions are (a) among the individuals shown in the right plots, is there any individual whose data are displayed in the left panels? and (b) if yes, then which ones? The answer to the riddle is provided at the end of the discussion and in the acknowledgements.

**Figure 2.**
Subset of the empirical joint distribution of acceleration and lag acceleration for Subject 19 in the IU data. The pairs ${v_{i j} (s - u), v_{i j} (s)}$ are plotted for $u = 1, 15, 30, 45$ centiseconds (columns) and $j = 1, 2$ s (rows).

**Figure 3.**
Predictor extraction for Subject 19. The values of $X_{i j c}$ for Subject 19 in the IU data are shown for $u = 1, 15, 30, 45$ (columns) and $j = 1, 2$ (rows). The white number in each grid cell is the value of $X_{i j c}$ for that cell. For example, $X_{i = 19, j = 2, c = [0.75, 1.00), [0.75, 1.00)} = 61$ and is shown in the bottom-left corner of the plot. Only a subset of the grid cells is shown as the other grid cells have no observations for these two seconds.

**Figure 4.**
Classification metrics over varying number of seconds in testing data. First row: rank-1 accuracies; Second row: rank-5 accuracies. Each column corresponds to different data and prediction tasks. The lines show how accuracy for each model changes as the number of seconds averaged over in the testing data is increased.

**Figure 5.**
Significant grid cells from image partitioning, Subject 143 ZJU Session 1. (a) Grid cells that are significant in distinguishing Subject 143 from the other subjects in the ZJU S1 task. (b) Grid cells that are significant after adjusting for correlation and multiplicity. (a) Unadjusted and (b) Correlation and Multiplicity Adjusted.

**Figure 6.**
Comparison of data from well and poorly predicted subjects. Panel (a) demonstrates a subset of the joint distribution for subject 5 (left) and subject 79 (right) in session 1 (top row) and session 2 (bottom row). The images are similar between sessions and these individuals were correctly identified in session 2 from their session 1 data. Panel (b) shows the same subset of the joint distribution for subject 3 (left) and subject 136 (right) in session 1 (top row) and session 2 (bottom row). The images do not look similar, and hence these individuals were not correctly predicted in the ZJU S1S2 task.

See this image and copyright information in PMC

References

1. Bours P., & Shrestha R. (2010). Eigensteps: A giant leap for gait recognition. In 2010 2nd International Workshop on Security and Communication Networks (IWSCN) (pp. 1–6). IEEE.
1. Chellappa R., Veeraraghavan A., & Ramanathan N. (2009). Gait Biometrics, Overview. In (pp. 628–633). Springer US.
1. Chen T., & Guestrin C. (2016). XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
1. Chipman H. A., George E. I., & McCulloch R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298. 10.1214/09-AOAS285 - DOI
1. Cohen J. A., & Verghese J. (2019). Gait and dementia. Handbook of Clinical Neurology, 167, 419–427. 10.1016/B978-0-12-804766-8.00022-4 - DOI - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Walking fingerprinting

Affiliations

Walking fingerprinting

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources