Towards Single Camera Human 3D-Kinematics

Marian Bittner^{1

2

3}, Wei-Tse Yang², Xucong Zhang², Ajay Seth³, Jan van Gemert², Frans C T van der Helm³

Affiliations

¹ Vicarious Perception Technologies (VicarVision), 1015 AH Amsterdam, The Netherlands.
² Computer Vision Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.
³ Biomechanical Engineering, Delft University of Technology, 2628 CN Delft, The Netherlands.

PMID: 36616937
PMCID: PMC9823525
DOI: 10.3390/s23010341

Towards Single Camera Human 3D-Kinematics

Marian Bittner et al. Sensors (Basel). 2022.

. 2022 Dec 28;23(1):341.

doi: 10.3390/s23010341.

Authors

Marian Bittner^{1

2

3}, Wei-Tse Yang², Xucong Zhang², Ajay Seth³, Jan van Gemert², Frans C T van der Helm³

Affiliations

¹ Vicarious Perception Technologies (VicarVision), 1015 AH Amsterdam, The Netherlands.
² Computer Vision Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.
³ Biomechanical Engineering, Delft University of Technology, 2628 CN Delft, The Netherlands.

PMID: 36616937
PMCID: PMC9823525
DOI: 10.3390/s23010341

Abstract

Markerless estimation of 3D Kinematics has the great potential to clinically diagnose and monitor movement disorders without referrals to expensive motion capture labs; however, current approaches are limited by performing multiple de-coupled steps to estimate the kinematics of a person from videos. Most current techniques work in a multi-step approach by first detecting the pose of the body and then fitting a musculoskeletal model to the data for accurate kinematic estimation. Errors in training data of the pose detection algorithms, model scaling, as well the requirement of multiple cameras limit the use of these techniques in a clinical setting. Our goal is to pave the way toward fast, easily applicable and accurate 3D kinematic estimation. To this end, we propose a novel approach for direct 3D human kinematic estimation D3KE from videos using deep neural networks. Our experiments demonstrate that the proposed end-to-end training is robust and outperforms 2D and 3D markerless motion capture based kinematic estimation pipelines in terms of joint angles error by a large margin (35% from 5.44 to 3.54 degrees). We show that D3KE is superior to the multi-step approach and can run at video framerate speeds. This technology shows the potential for clinical analysis from mobile devices in the future.

Keywords: 3D-kinematic estimation; 3D-kinematics; OpenSim; markerless motioncapture; musculoskeletal modelling; pose estimation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Overview of the proposed direct 3D human kinematics estimation (D3KE). Instead of using the common ’multi-step’ approach of predicting pose, fitting it to a model, and estimating kinematics, our D3KE directly estimates the kinematics. Errors in earlier steps of the multi-step approach propagate to later steps; in contrast, our method can correct for errors occurring anywhere between input and output.

**Figure 2**
Taking a single view video as input, D3KE consists of one convolutional neural network and one sequential network. Per frame, D3KE outputs joint angles and scales of individual bones in a skeletal model(scale factors) with a convolutional network. Additionally, joint angle and scale factor are converted to a pose through the skeletal-model kinematics (SM) layer. A series of frame estimations in time are then fed into a sequential network to smooth the estimations and reduce artifacts if one limb occludes another in the view of the camera (self-occlusion).

**Figure 3**
Our skeletal-model layer uses an internal representation of a skeletal model to convert the predicted joint angles and scale factors to the positions of individual markers on segments of the skeletal model. This allows our method to be supervised during training not only on errors(losses) in the estimation of joint angles but also on errors in the resulting pose. On the right, we show the additional error that is created between estimations (gray) and ground truth (blue). This auxiliary estimation of the pose as 3D marker positions helps to constrain the estimation of joint angles as small changes in proximal joints can have a large effect on a marker at more distal positions.

**Figure 4**
Mean absolute error for predicted joint angles per joint, movement and subset. Across these groups, D3KE shows less variation compared to the CMS. The low variations indicate that D3KE is suitable for use on different participants and movements.

**Figure 5**
Qualitative results of D3KE from a ’sitting-down’ movement in the BML-Movi dataset. The top row shows selected frames throughout the movement. The middle row shows different poses of the ground truth skeletal model throughout the movement (cyan) and the skeletal model (white) based on D3KE’s estimation. The bottom row shows the changes in flexion/extension of the left knee throughout the movement with blue being the predicted and orange being the ground truth angle.

**Figure 6**
Qualitative comparison of predicted angles of the left knee, from a ’sitting-down’ movement in the BML-Movi dataset. While OpenPose shows by far the noisiest estimation, the smoothing of the MediaPipe estimation is clear, our proposed method and implemented CMS work best, probably due to the restriction of additional markers.

See this image and copyright information in PMC

References

1. Kidziński Ł., Yang B., Hicks J.L., Rajagopal A., Delp S.L., Schwartz M.H. Deep neural networks enable quantitative movement analysis using single-camera videos. Nat. Commun. 2020;11:4054. doi: 10.1038/s41467-020-17807-z. - DOI - PMC - PubMed
1. Pagnon D., Domalain M., Reveret L. Pose2Sim: An End-to-End Workflow for 3D Markerless Sports Kinematics—Part 1: Robustness. Sensors. 2021;21:6530. doi: 10.3390/s21196530. - DOI - PMC - PubMed
1. Pagnon D., Domalain M., Reveret L. Pose2Sim: An End-to-End Workflow for 3D Markerless Sports Kinematics—Part 2: Accuracy. Sensors. 2022;22:2712. doi: 10.3390/s22072712. - DOI - PMC - PubMed
1. Kanko R.M., Laende E.K., Strutzenberger G., Brown M., Selbie W.S., DePaul V., Scott S.H., Deluzio K.J. Assessment of spatiotemporal gait parameters using a deep learning algorithm-based markerless motion capture system. J. Biomech. 2021;122:110414. doi: 10.1016/j.jbiomech.2021.110414. - DOI - PubMed
1. Gu X., Deligianni F., Lo B., Chen W., Yang G. Markerless gait analysis based on a single RGB camera; Proceedings of the 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN); Las Vegas, NV, USA. 4–7 March 2018; pp. 42–45. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards Single Camera Human 3D-Kinematics

Affiliations

Towards Single Camera Human 3D-Kinematics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources