Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 12;23(2):876.
doi: 10.3390/s23020876.

Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation

Affiliations

Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation

Tomohiro Fujita et al. Sensors (Basel). .

Abstract

Human pose prediction is vital for robot applications such as human-robot interaction and autonomous control of robots. Recent prediction methods often use deep learning and are based on a 3D human skeleton sequence to predict future poses. Even if the starting motions of 3D human skeleton sequences are very similar, their future poses will have variety. It makes it difficult to predict future poses only from a given human skeleton sequence. Meanwhile, when carefully observing human motions, we can find that human motions are often affected by objects or other people around the target person. We consider that the presence of surrounding objects is an important clue for the prediction. This paper proposes a method for predicting the future skeleton sequence by incorporating the surrounding situation into the prediction model. The proposed method uses a feature of an image around the target person as the surrounding information. We confirmed the performance improvement of the proposed method through evaluations on publicly available datasets. As a result, the prediction accuracy was improved for object-related and human-related motions.

Keywords: 3D skeleton sequence; pose prediction; surrounding information.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Example of motions that show that starting motions are very similar, but the future motions are different. In such cases, the predictions may be difficult, or the deep learning model may not train well. It is not enough to use only the 3D human skeleton sequence of starting motion.
Figure 2
Figure 2
The proposed surrounding-aware future skeleton prediction framework. The skeleton feature S is modified using the surrounding image feature i extracted from the human surrounding image.
Figure 3
Figure 3
The structure of the IAA module. The skeleton feature extractor is a graph convolution layer of any existing GCN-based prediction model. A surrounding image feature i is used to calculate the attention matrix A for modifying the original skeleton feature S.
Figure 4
Figure 4
The three locations where the IAA module is inserted into any existing GCN-based model. P1 is the initial stage of the feature extraction, P2 is the end of the feature extraction, and P3 is the final stage of the generating prediction.
Figure 5
Figure 5
Examples of 3D skeletons and RGB images in the NTU RGB+D 120 dataset. The upper one shows “sit down” and the lower one shows “squat down”. The 3D human skeletons are on the left and the RGB image are on the right.
Figure 6
Figure 6
Examples of 3D skeletons and RGB images in the PKU-MMD dataset. The upper one shows “sit down” and the lower one shows “pick up”. The 3D human skeletons are on the left and the RGB image are on the right.
Figure 7
Figure 7
Differences in the class-wise average MPJPE between MSR-GCN and the models of MSR-GCN with IAA calculated. In this figure, (a) is the difference of MSR-GCN minus MSR-GCN with IAA P1, (b) is the difference of MSR-GCN with IAA P2 minus MSR-GCN with IAA P1, and (c) is the difference of MSR-GCN with IAA P3 minus MSR-GCN with IAA P1. Also, MSR-GCN with IAA P1 (2 FC layers), MSR-GCN with IAA P2 (3 FC layers), and MSR-GCN with IAA P3 (2 FC layers) are compared.
Figure 8
Figure 8
An example of predicted result (blue) and ground truth (red) in the “sit down” action. We can see that the predicted skeletons of MSR-GCN with IAA (with 2 FC layers) are closer to the ground truths when using the image of the surroundings.
Figure 9
Figure 9
A prediction example of the 30th frame of prediction (blue) and ground truth (red) in “sit down”. In this example, the existing method made the misprediction as “jump up”, but our proposed method predicted it correctly. In object-related motions, the future skeleton is well predicted by our proposed IAA.
Figure 10
Figure 10
A prediction example of the 30th frame of prediction (blue) and ground truth (red) in “squat down”. In non-object-related motions, the future skeleton is also well predicted by our proposed IAA.

References

    1. Foka A., Trahanias P. Probabilistic Autonomous Robot Navigation in Dynamic Environments with Human Motion Prediction. Int. J. Soc. Robot. 2010;2:79–94. doi: 10.1007/s12369-009-0037-z. - DOI
    1. Koppula H.S., Saxena A. Anticipating human activities for reactive robotic response; Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems; Tokyo, Japan. 3–7 November 2013; p. 2071. - DOI
    1. Gong H., Sim J., Likhachev M., Shi J. Multi-hypothesis motion planning for visual object tracking; Proceedings of the 2011 International Conference on Computer Vision; Barcelona, Spain. 6–13 November 2011; pp. 619–626. - DOI
    1. Liu H., Wang L. Human motion prediction for human-robot collaboration. J. Manuf. Syst. 2017;44:287–294. doi: 10.1016/j.jmsy.2017.04.009. - DOI
    1. Gui L.Y., Zhang K., Wang Y.X., Liang X., Moura J.M.F., Veloso M. Teaching Robots to Predict Human Motion; Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Madrid, Spain. 1–5 October 2018; pp. 562–567. - DOI