Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr;3(2):e63.
doi: 10.1002/ail2.63. Epub 2022 Mar 23.

Generative model-enhanced human motion prediction

Affiliations

Generative model-enhanced human motion prediction

Anthony Bourached et al. Appl AI Lett. 2022 Apr.

Abstract

The task of predicting human motion is complicated by the natural heterogeneity and compositionality of actions, necessitating robustness to distributional shifts as far as out-of-distribution (OoD). Here, we formulate a new OoD benchmark based on the Human3.6M and Carnegie Mellon University (CMU) motion capture datasets, and introduce a hybrid framework for hardening discriminative architectures to OoD failure by augmenting them with a generative model. When applied to current state-of-the-art discriminative models, we show that the proposed approach improves OoD robustness without sacrificing in-distribution performance, and can theoretically facilitate model interpretability. We suggest human motion predictors ought to be constructed with OoD challenges in mind, and provide an extensible general framework for hardening diverse discriminative architectures to extreme distributional shift. The code is available at: https://github.com/bouracha/OoDMotion.

Keywords: deep learning; generative models; human motion prediction; variational autoencoders.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Graph convolutional network (GCN) network architecture with variational autoencoder (VAE) branch. Here, nz=16 is the number of latent variables per joint
FIGURE A1
FIGURE A1
(A) Distribution of short‐term training instances for actions in H3.6M. (B) Distribution of training instances for actions in CMU
FIGURE A2
FIGURE A2
Confusion matrix for a multiclass classifier for action labels. In each case, we use the same input convention xk=xk,1xk,Nxk,N+1xk,N+T , where xk,n=xk,N for nN. Such that in each case input to the classifier is 48×20=960. The classifier has four fully connected layers. Layer 1: input dimensions × 1024, layer 2: 1024×512, layer 3: 512×128, layer 4: 128×15 (or 128×8 for CMU). Where the final layer uses a softmax to predict the class label. Cross entropy loss is used for training and ReLU activations with a dropout probability of 0.5. We used a batch size of 2048, and a learning rate of 0.00001.H3.6M dataset. N=10, T=10. Number of discrete cosine transformations (DCT) coefficients = 20 (lossesless transformation)
FIGURE A3
FIGURE A3
Confusion matrix for a multiclass classifier for action labels. In each case, we use the same input convention xk=xk,1xk,Nxk,N+1xk,N+T , where xk,n=xk,N for nN. Such that in each case input to the classifier is 48×20=960. The classifier has four fully connected layers. Layer 1: input dimensions × 1024, layer 2: 1024×512, layer 3: 512×128, layer 4: 128×15 (or 128×8 for CMU). Where the final layer uses a softmax to predict the class label. Cross entropy loss is used for training and ReLU activations with a dropout probability of 0.5. We used a batch size of 2048, and a learning rate of 0.00001.H3.6M dataset. N=50, T=10. Number of discrete cosine transformations (DCT) coefficients = 20, where the 40 highest frequency DCT coefficients are culled
FIGURE A4
FIGURE A4
Confusion matrix for a multiclass classifier for action labels. In each case, we use the same input convention xk=xk,1xk,Nxk,N+1xk,N+T , where xk,n=xk,N for nN. Such that in each case input to the classifier is 48×20=960. The classifier has four fully connected layers. Layer 1: input dimensions × 1024, layer 2: 1024×512, layer 3: 512×128, layer 4: 128×15 (or 128×8 for CMU). Where the final layer uses a softmax to predict the class label. Cross entropy loss is used for training and ReLU activations with a dropout probability of 0.5. We used a batch size of 2048, and a learning rate of 0.00001. CMU dataset. N=10, T=25. Number of discrete cosine transformations (DCT) coefficients = 35 (losses less transformation)
FIGURE A5
FIGURE A5
Latent embedding of the trained model on both the H3.6M and the CMU datasets independently projected in 2D using UMAP from 384 dimensions for H3.6M, and 512 dimensions for CMU using default hyperparameters for UMAP. (A) H3.6M. All actions, opacity=0.1. (B) H3.6M. All actions in blue: opacity=0.1. Walking in red: opacity=1. (C) CMU. All actions in blue: opacity = 0.1
FIGURE A6
FIGURE A6
Network architecture with discriminative and variational autoencoder (VAE) branch
FIGURE A7
FIGURE A7
Graph convolutional layer (GCL) and a residual graph convolutional block (GCB)

References

    1. Geertsema EE, Thijs RD, Gutter T, et al. Automated video‐based detection of nocturnal convulsive seizures in a residential care setting. Epilepsia. 2018;59:53‐60. - PubMed
    1. Kakar M, Nyström H, Aarup LR, Nøttrup TJ, Olsen DR. Respiratory motion prediction by using the adaptive neuro fuzzy inference system (anfis). Phys Med Biol. 2005;50(19):4721‐4728. - PubMed
    1. Chang C‐Y, Lange B, Zhang M, et al. Towards pervasive physical rehabilitation using microsoft kinect. 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops. IEEE; 2012:159‐162. https://ieeexplore.ieee.org/abstract/document/6240377
    1. Webster D, Celik O. Systematic review of kinect applications in elderly care and stroke rehabilitation. J Neuroeng Rehabil. 2014;11(1):108. - PMC - PubMed
    1. Gui L‐Y, Zhang K, Wang Y‐X, Liang X, Moura JM, Veloso M. Teaching robots to predict human motion. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2018:562‐567. https://ieeexplore.ieee.org/abstract/document/8594452

LinkOut - more resources