Generative model-enhanced human motion prediction

Anthony Bourached¹, Ryan-Rhys Griffiths², Robert Gray¹, Ashwani Jha¹, Parashkev Nachev¹

Affiliations

PMID: 35669063
PMCID: PMC9159682
DOI: 10.1002/ail2.63

Generative model-enhanced human motion prediction

Anthony Bourached et al. Appl AI Lett. 2022 Apr.

. 2022 Apr;3(2):e63.

doi: 10.1002/ail2.63. Epub 2022 Mar 23.

Authors

Anthony Bourached¹, Ryan-Rhys Griffiths², Robert Gray¹, Ashwani Jha¹, Parashkev Nachev¹

Affiliations

¹ Department of Neurology University College London London UK.
² Department of Physics University of Cambridge Cambridge UK.

PMID: 35669063
PMCID: PMC9159682
DOI: 10.1002/ail2.63

Abstract

The task of predicting human motion is complicated by the natural heterogeneity and compositionality of actions, necessitating robustness to distributional shifts as far as out-of-distribution (OoD). Here, we formulate a new OoD benchmark based on the Human3.6M and Carnegie Mellon University (CMU) motion capture datasets, and introduce a hybrid framework for hardening discriminative architectures to OoD failure by augmenting them with a generative model. When applied to current state-of-the-art discriminative models, we show that the proposed approach improves OoD robustness without sacrificing in-distribution performance, and can theoretically facilitate model interpretability. We suggest human motion predictors ought to be constructed with OoD challenges in mind, and provide an extensible general framework for hardening diverse discriminative architectures to extreme distributional shift. The code is available at: https://github.com/bouracha/OoDMotion.

Keywords: deep learning; generative models; human motion prediction; variational autoencoders.

PubMed Disclaimer

Figures

**FIGURE 1**
Graph convolutional network (GCN) network architecture with variational autoencoder (VAE) branch. Here, $n_{z} = 16$ is the number of latent variables per joint

**FIGURE A1**
(A) Distribution of short‐term training instances for actions in H3.6M. (B) Distribution of training instances for actions in CMU

**FIGURE A2**
Confusion matrix for a multiclass classifier for action labels. In each case, we use the same input convention ${\vec{x}}_{k} = [x_{k, 1}, \dots, x_{k, N}, x_{k, N + 1}, \dots, x_{k, N + T}]$ _, where $x_{k, n} = x_{k, N}$ for $n \geq N$ . Such that in each case input to the classifier is $48 \times 20 = 960$ . The classifier has four fully connected layers. Layer 1: input dimensions × 1024, layer 2: $1024 \times 512$ , layer 3: $512 \times 128$ , layer 4: $128 \times 15$ (or $128 \times 8$ for CMU). Where the final layer uses a softmax to predict the class label. Cross entropy loss is used for training and ReLU activations with a dropout probability of 0.5. We used a batch size of 2048, and a learning rate of 0.00001.H3.6M dataset. $N = 10$ , $T = 10$ . Number of discrete cosine transformations (DCT) coefficients = 20 (lossesless transformation)

**FIGURE A3**
Confusion matrix for a multiclass classifier for action labels. In each case, we use the same input convention ${\vec{x}}_{k} = [x_{k, 1}, \dots, x_{k, N}, x_{k, N + 1}, \dots, x_{k, N + T}]$ _, where $x_{k, n} = x_{k, N}$ for $n \geq N$ . Such that in each case input to the classifier is $48 \times 20 = 960$ . The classifier has four fully connected layers. Layer 1: input dimensions × 1024, layer 2: $1024 \times 512$ , layer 3: $512 \times 128$ , layer 4: $128 \times 15$ (or $128 \times 8$ for CMU). Where the final layer uses a softmax to predict the class label. Cross entropy loss is used for training and ReLU activations with a dropout probability of 0.5. We used a batch size of 2048, and a learning rate of 0.00001.H3.6M dataset. $N = 50$ , $T = 10$ . Number of discrete cosine transformations (DCT) coefficients = 20, where the 40 highest frequency DCT coefficients are culled

**FIGURE A4**
Confusion matrix for a multiclass classifier for action labels. In each case, we use the same input convention ${\vec{x}}_{k} = [x_{k, 1}, \dots, x_{k, N}, x_{k, N + 1}, \dots, x_{k, N + T}]$ _, where $x_{k, n} = x_{k, N}$ for $n \geq N$ . Such that in each case input to the classifier is $48 \times 20 = 960$ . The classifier has four fully connected layers. Layer 1: input dimensions × 1024, layer 2: $1024 \times 512$ , layer 3: $512 \times 128$ , layer 4: $128 \times 15$ (or $128 \times 8$ for CMU). Where the final layer uses a softmax to predict the class label. Cross entropy loss is used for training and ReLU activations with a dropout probability of 0.5. We used a batch size of 2048, and a learning rate of 0.00001. CMU dataset. $N = 10$ , $T = 25$ . Number of discrete cosine transformations (DCT) coefficients = 35 (losses less transformation)

**FIGURE A5**
Latent embedding of the trained model on both the H3.6M and the CMU datasets independently projected in 2D using UMAP from 384 dimensions for H3.6M, and 512 dimensions for CMU using default hyperparameters for UMAP. (A) H3.6M. All actions, opacity=0.1. (B) H3.6M. All actions in blue: opacity=0.1. Walking in red: opacity=1. (C) CMU. All actions in blue: opacity = 0.1

**FIGURE A6**
Network architecture with discriminative and variational autoencoder (VAE) branch

**FIGURE A7**
Graph convolutional layer (GCL) and a residual graph convolutional block (GCB)

See this image and copyright information in PMC

References

1. Geertsema EE, Thijs RD, Gutter T, et al. Automated video‐based detection of nocturnal convulsive seizures in a residential care setting. Epilepsia. 2018;59:53‐60. - PubMed
1. Kakar M, Nyström H, Aarup LR, Nøttrup TJ, Olsen DR. Respiratory motion prediction by using the adaptive neuro fuzzy inference system (anfis). Phys Med Biol. 2005;50(19):4721‐4728. - PubMed
1. Chang C‐Y, Lange B, Zhang M, et al. Towards pervasive physical rehabilitation using microsoft kinect. 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops. IEEE; 2012:159‐162. https://ieeexplore.ieee.org/abstract/document/6240377
1. Webster D, Celik O. Systematic review of kinect applications in elderly care and stroke rehabilitation. J Neuroeng Rehabil. 2014;11(1):108. - PMC - PubMed
1. Gui L‐Y, Zhang K, Wang Y‐X, Liang X, Moura JM, Veloso M. Teaching robots to predict human motion. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2018:562‐567. https://ieeexplore.ieee.org/abstract/document/8594452

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Generative model-enhanced human motion prediction

Affiliations

Generative model-enhanced human motion prediction

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources