Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 31;23(3):1555.
doi: 10.3390/s23031555.

PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation

Affiliations

PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation

Tianze Yu et al. Sensors (Basel). .

Abstract

This paper tackles a novel and challenging problem-3D hand pose estimation (HPE) from a single RGB image using partial annotation. Most HPE methods ignore the fact that the keypoints could be partially visible (e.g., under occlusions). In contrast, we propose a deep-learning framework, PA-Tran, that jointly estimates the keypoints status and 3D hand pose from a single RGB image with two dependent branches. The regression branch consists of a Transformer encoder which is trained to predict a set of target keypoints, given an input set of status, position, and visual features embedding from a convolutional neural network (CNN); the classification branch adopts a CNN for estimating the keypoints status. One key idea of PA-Tran is a selective mask training (SMT) objective that uses a binary encoding scheme to represent the status of the keypoints as observed or unobserved during training. In addition, by explicitly encoding the label status (observed/unobserved), the proposed PA-Tran can efficiently handle the condition when only partial annotation is available. Investigating the annotation percentage ranging from 50-100%, we show that training with partial annotation is more efficient (e.g., achieving the best 6.0 PA-MPJPE when using about 85% annotations). Moreover, we provide two new datasets. APDM-Hand, is for synthetic hands with APDM sensor accessories, which is designed for a specific hand task. PD-APDM-Hand, is a real hand dataset collected from Parkinson's Disease (PD) patients with partial annotation. The proposed PA-Tran can achieve higher estimation accuracy when evaluated on both proposed datasets and a more general hand dataset.

Keywords: 3D hand pose estimation; PD (Parkinson’s disease) hand dataset; partial annotation; single RGB image; synthetic dataset; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Biological characteristics of the human hand skeleton: (a) Illustration of the DoF of the hand; (b) Indices of the hand joints.
Figure 2
Figure 2
Overview of the proposed PA-Tran framework. Given an input image I, we extract the image features using a convolution neural network. Then the image features are passed into two separate branches: the regression branch reg(·) and the classification branch cla(·). cla(·) will generate the status embedding for reg(·) and masks for SMT to learn the interaction between labels. The structures of reg(·) and cla(·) are detailed in Section 3.2.
Figure 3
Figure 3
The structure of the reg(·) branch. The input is the concatenation of feature embedding, position embedding, and status embedding. Sequential transformer blocks are adopted to reduce the dimension of the hidden embedding progressively. The final output is the coordinates of the keypoints.
Figure 4
Figure 4
Examples of finger-tapping animation frames with motion blur.
Figure 5
Figure 5
Examples of hand-movement animation frames with motion blur.
Figure 6
Figure 6
Examples of APDM-Hand images from different views and backgrounds.
Figure 7
Figure 7
Examples of PD-APDM-Hand, which is collected from real Parkinson’s Disease patients when taking the UPDRS test.
Figure 8
Figure 8
Qualititive result on APDM-Hand dataset: (a) Ground truth; (b) METRO; (c) PA-Tran.
Figure 9
Figure 9
Qualititive results on PD-APDM-Hand dataset: (a) PD subject 1; (b) PD subject 2.
Figure 10
Figure 10
Hand pose estimation with motion blur.

References

    1. Chatzis T., Stergioulas A., Konstantinidis D., Dimitropoulos K., Daras P. A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods. Appl. Sci. 2020;10:6850. doi: 10.3390/app10196850. - DOI
    1. Sagayam K.M., Hemanth D.J. Hand posture and gesture recognition techniques for virtual reality applications: A survey. Virtual Real. 2017;21:91–107. doi: 10.1007/s10055-016-0301-0. - DOI
    1. Meier M., Streli P., Fender A., Holz C. TapID: Rapid touch interaction in virtual reality using wearable sensing; Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces (VR); Lisboa, Portugal. 27 March–1 April 2021; New York, NY, USA: IEEE; 2021. pp. 519–528.
    1. Noreen I., Hamid M., Akram U., Malik S., Saleem M. Hand pose recognition using parallel multi stream CNN. Sensors. 2021;21:8469. doi: 10.3390/s21248469. - DOI - PMC - PubMed
    1. Guo L., Lu Z., Yao L. Human-machine interaction sensing technology based on hand gesture recognition: A review. IEEE Trans. Hum.-Mach. Syst. 2021;51:300–309. doi: 10.1109/THMS.2021.3086003. - DOI

LinkOut - more resources