. 2023 Jan 31;23(3):1555.

doi: 10.3390/s23031555.

PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation

Tianze Yu¹, Luke Bidulka¹, Martin J McKeown², Z Jane Wang¹

Affiliations

¹ Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
² Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.

PMID: 36772595
PMCID: PMC9919574
DOI: 10.3390/s23031555

PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation

Tianze Yu et al. Sensors (Basel). 2023.

. 2023 Jan 31;23(3):1555.

doi: 10.3390/s23031555.

Authors

Tianze Yu¹, Luke Bidulka¹, Martin J McKeown², Z Jane Wang¹

Affiliations

¹ Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
² Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.

PMID: 36772595
PMCID: PMC9919574
DOI: 10.3390/s23031555

Abstract

This paper tackles a novel and challenging problem-3D hand pose estimation (HPE) from a single RGB image using partial annotation. Most HPE methods ignore the fact that the keypoints could be partially visible (e.g., under occlusions). In contrast, we propose a deep-learning framework, PA-Tran, that jointly estimates the keypoints status and 3D hand pose from a single RGB image with two dependent branches. The regression branch consists of a Transformer encoder which is trained to predict a set of target keypoints, given an input set of status, position, and visual features embedding from a convolutional neural network (CNN); the classification branch adopts a CNN for estimating the keypoints status. One key idea of PA-Tran is a selective mask training (SMT) objective that uses a binary encoding scheme to represent the status of the keypoints as observed or unobserved during training. In addition, by explicitly encoding the label status (observed/unobserved), the proposed PA-Tran can efficiently handle the condition when only partial annotation is available. Investigating the annotation percentage ranging from 50-100%, we show that training with partial annotation is more efficient (e.g., achieving the best 6.0 PA-MPJPE when using about 85% annotations). Moreover, we provide two new datasets. APDM-Hand, is for synthetic hands with APDM sensor accessories, which is designed for a specific hand task. PD-APDM-Hand, is a real hand dataset collected from Parkinson's Disease (PD) patients with partial annotation. The proposed PA-Tran can achieve higher estimation accuracy when evaluated on both proposed datasets and a more general hand dataset.

Keywords: 3D hand pose estimation; PD (Parkinson’s disease) hand dataset; partial annotation; single RGB image; synthetic dataset; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Biological characteristics of the human hand skeleton: (a) Illustration of the DoF of the hand; (b) Indices of the hand joints.

**Figure 2**
Overview of the proposed PA-Tran framework. Given an input image $I$ , we extract the image features using a convolution neural network. Then the image features are passed into two separate branches: the regression branch $r e g (\cdot)$ and the classification branch $c l a (\cdot)$ . $c l a (\cdot)$ will generate the status embedding for $r e g (\cdot)$ and masks for $S M T$ to learn the interaction between labels. The structures of $r e g (\cdot)$ and $c l a (\cdot)$ are detailed in Section 3.2.

**Figure 3**
The structure of the $r e g (\cdot)$ branch. The input is the concatenation of feature embedding, position embedding, and status embedding. Sequential transformer blocks are adopted to reduce the dimension of the hidden embedding progressively. The final output is the coordinates of the keypoints.

**Figure 4**
Examples of finger-tapping animation frames with motion blur.

**Figure 5**
Examples of hand-movement animation frames with motion blur.

**Figure 6**
Examples of APDM-Hand images from different views and backgrounds.

**Figure 7**
Examples of PD-APDM-Hand, which is collected from real Parkinson’s Disease patients when taking the UPDRS test.

**Figure 8**
Qualititive result on APDM-Hand dataset: (a) Ground truth; (b) METRO; (c) PA-Tran.

**Figure 9**
Qualititive results on PD-APDM-Hand dataset: (a) PD subject 1; (b) PD subject 2.

**Figure 10**
Hand pose estimation with motion blur.

See this image and copyright information in PMC

References

1. Chatzis T., Stergioulas A., Konstantinidis D., Dimitropoulos K., Daras P. A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods. Appl. Sci. 2020;10:6850. doi: 10.3390/app10196850. - DOI
1. Sagayam K.M., Hemanth D.J. Hand posture and gesture recognition techniques for virtual reality applications: A survey. Virtual Real. 2017;21:91–107. doi: 10.1007/s10055-016-0301-0. - DOI
1. Meier M., Streli P., Fender A., Holz C. TapID: Rapid touch interaction in virtual reality using wearable sensing; Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces (VR); Lisboa, Portugal. 27 March–1 April 2021; New York, NY, USA: IEEE; 2021. pp. 519–528.
1. Noreen I., Hamid M., Akram U., Malik S., Saleem M. Hand pose recognition using parallel multi stream CNN. Sensors. 2021;21:8469. doi: 10.3390/s21248469. - DOI - PMC - PubMed
1. Guo L., Lu Z., Yao L. Human-machine interaction sensing technology based on hand gesture recognition: A review. IEEE Trans. Hum.-Mach. Syst. 2021;51:300–309. doi: 10.1109/THMS.2021.3086003. - DOI

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation

Affiliations

PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources