Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 29;24(1):206.
doi: 10.3390/s24010206.

Feasibility of 3D Body Tracking from Monocular 2D Video Feeds in Musculoskeletal Telerehabilitation

Affiliations

Feasibility of 3D Body Tracking from Monocular 2D Video Feeds in Musculoskeletal Telerehabilitation

Carolina Clemente et al. Sensors (Basel). .

Abstract

Musculoskeletal conditions affect millions of people globally; however, conventional treatments pose challenges concerning price, accessibility, and convenience. Many telerehabilitation solutions offer an engaging alternative but rely on complex hardware for body tracking. This work explores the feasibility of a model for 3D Human Pose Estimation (HPE) from monocular 2D videos (MediaPipe Pose) in a physiotherapy context, by comparing its performance to ground truth measurements. MediaPipe Pose was investigated in eight exercises typically performed in musculoskeletal physiotherapy sessions, where the Range of Motion (ROM) of the human joints was the evaluated parameter. This model showed the best performance for shoulder abduction, shoulder press, elbow flexion, and squat exercises. Results have shown a MAPE ranging between 14.9% and 25.0%, Pearson's coefficient ranging between 0.963 and 0.996, and cosine similarity ranging between 0.987 and 0.999. Some exercises (e.g., seated knee extension and shoulder flexion) posed challenges due to unusual poses, occlusions, and depth ambiguities, possibly related to a lack of training data. This study demonstrates the potential of HPE from monocular 2D videos, as a markerless, affordable, and accessible solution for musculoskeletal telerehabilitation approaches. Future work should focus on exploring variations of the 3D HPE models trained on physiotherapy-related datasets, such as the Fit3D dataset, and post-preprocessing techniques to enhance the model's performance.

Keywords: 2D camera; 3D Human Pose Estimation; MediaPipe Pose; ROM; deep learning; monocular; musculoskeletal; telerehabilitation; videos.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.

Figures

Figure 1
Figure 1
Example of skeletal human body representation: 33 landmarks of MediaPipe Pose, where the right-side landmarks are represented in blue, the left-side landmarks in orange, and the nose landmark in white.
Figure 2
Figure 2
Classification of 2D camera-based models for Human Pose Estimation (HPE).
Figure 3
Figure 3
Eight exercises selected for the experimental study: Shoulder Flexion/Extension (SF), Shoulder Abduction/Adduction (SA), Elbow Flexion/Extension (EF), Shoulder Press (SP), Hip Abduction/Adduction (HA), Squat (SQ), March (MCH), and Seated Knee Flexion/Extension (SKF). Shoulder press and squat exercises are illustrated by a sequence of two representative images of the movement.
Figure 4
Figure 4
Experimental setup for the data acquisition, showing some of the Qualisys cameras, two 2D cameras, and the relative position between the subject and the two 2D cameras.
Figure 5
Figure 5
Anatomical location of the six Qualisys MoCap markers.
Figure 6
Figure 6
The 3D Cartesian coordinate system of Qualisys (in orange) and its spatial relation with respect to the participant position during data acquisition.
Figure 7
Figure 7
Comparison of the normal vectors of the anatomical planes (in black) with the Qualisys coordinate system (in orange).
Figure 8
Figure 8
Relation between the participant position and the Cartesian coordinate system of the MediaPipe Pose model for three camera orientations: (a) camera plane parallel to participant frontal plane; (b) camera plane rotated around the Y-axis relative to participant frontal plane; and (c) camera plane rotated around the X-axis relative to participant frontal plane. The camera 2D coordinate system is represented by the X’-axis and Y’-axis, which are parallel to the X-axis and Y-axis of the algorithm coordinate system, respectively.
Figure 9
Figure 9
The virtual 3D coordinate system of MediaPipe Pose coincident with the normal vectors of the anatomical planes. The origin is the midpoint between the hips. The X-axis is the sagittal plane normal, the Y-axis is the transverse plane normal, and the Z-axis is the frontal plane normal. The four points (representing the shoulders and hips) are used to define the virtual 3D coordinate system.
Figure 10
Figure 10
Representation of virtual 3D coordinate system definition: (1) Z-axis or frontal plane normal; (2) Y-axis or transverse plane normal; and (3) X-axis or sagittal plane normal.
Figure 11
Figure 11
Comparison of the normal vectors of the anatomical planes (in black) with the MediaPipe Pose virtual coordinate system (in blue).
Figure 12
Figure 12
Amplitude calculation between the projected body segment vector and a reference direction.
Figure 13
Figure 13
Data alignment between the Qualisys ground truth amplitudes (in orange) and MediaPipe Pose predicted amplitudes (in blue).
Figure 14
Figure 14
Example of Qualisys ground truth (in orange) and MediaPipe Pose predicted (in blue) amplitudes for Subject 1 performing SA exercise and SKF exercise. (a,b) show the raw amplitude before the alignment procedure, and (c,d) the aligned amplitude data, before segmenting the sample to extract the exercise repetitions.
Figure 15
Figure 15
Relation between Qualisys and MediaPipe Pose motion amplitudes for (a) SA exercise and (b) SKF exercise. Each color represents a different subject, and the yellow line is the linear regression that best fits the amplitude data for the exercise; the coefficient of determination (R2) and the linear regression equation (slope and intercept) are also shown, where y and x are the Qualisys and MediaPipe Pose amplitudes, respectively.

References

    1. Cieza A., Causey K., Kamenov K., Hanson S.W., Chatterji S., Vos T. Global estimates of the need for rehabilitation based on the Global Burden of Disease study 2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396:2006–2017. doi: 10.1016/S0140-6736(20)32340-0. - DOI - PMC - PubMed
    1. Vieira L.M.S.M.d.A., de Andrade M.A., Sato T.d.O. Telerehabilitation for musculoskeletal pain–An overview of systematic reviews. Digit. Health. 2023;9:20552076231164242. doi: 10.1177/20552076231164242. - DOI - PMC - PubMed
    1. Cottrell M.A., Russell T.G. Telehealth for musculoskeletal physiotherapy. Musculoskelet. Sci. Pract. 2020;48:102193. doi: 10.1016/j.msksp.2020.102193. - DOI - PMC - PubMed
    1. Areias A.C., Costa F., Janela D., Molinos M., Moulder R.G., Lains J., Scheer J.K., Bento V., Yanamadala V., Correia F.D. Long-term clinical outcomes of a remote digital musculoskeletal program: An ad hoc analysis from a longitudinal study with a non-participant comparison group. Healthcare. 2022;10:2349. doi: 10.3390/healthcare10122349. - DOI - PMC - PubMed
    1. Dias G., Adrião M.L., Clemente P., da Silva H.P., Chambel G., Pinto J.F. Effectiveness of a Gamified and Home-Based Approach for Upper-limb Rehabilitation; Proceedings of the Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC); Scotland, UK. 11–15 July 2022; pp. 2602–2605. - PubMed

LinkOut - more resources