PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation
- PMID: 36772595
- PMCID: PMC9919574
- DOI: 10.3390/s23031555
PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation
Abstract
This paper tackles a novel and challenging problem-3D hand pose estimation (HPE) from a single RGB image using partial annotation. Most HPE methods ignore the fact that the keypoints could be partially visible (e.g., under occlusions). In contrast, we propose a deep-learning framework, PA-Tran, that jointly estimates the keypoints status and 3D hand pose from a single RGB image with two dependent branches. The regression branch consists of a Transformer encoder which is trained to predict a set of target keypoints, given an input set of status, position, and visual features embedding from a convolutional neural network (CNN); the classification branch adopts a CNN for estimating the keypoints status. One key idea of PA-Tran is a selective mask training (SMT) objective that uses a binary encoding scheme to represent the status of the keypoints as observed or unobserved during training. In addition, by explicitly encoding the label status (observed/unobserved), the proposed PA-Tran can efficiently handle the condition when only partial annotation is available. Investigating the annotation percentage ranging from 50-100%, we show that training with partial annotation is more efficient (e.g., achieving the best 6.0 PA-MPJPE when using about 85% annotations). Moreover, we provide two new datasets. APDM-Hand, is for synthetic hands with APDM sensor accessories, which is designed for a specific hand task. PD-APDM-Hand, is a real hand dataset collected from Parkinson's Disease (PD) patients with partial annotation. The proposed PA-Tran can achieve higher estimation accuracy when evaluated on both proposed datasets and a more general hand dataset.
Keywords: 3D hand pose estimation; PD (Parkinson’s disease) hand dataset; partial annotation; single RGB image; synthetic dataset; transformer.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
References
-
- Chatzis T., Stergioulas A., Konstantinidis D., Dimitropoulos K., Daras P. A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods. Appl. Sci. 2020;10:6850. doi: 10.3390/app10196850. - DOI
-
- Sagayam K.M., Hemanth D.J. Hand posture and gesture recognition techniques for virtual reality applications: A survey. Virtual Real. 2017;21:91–107. doi: 10.1007/s10055-016-0301-0. - DOI
-
- Meier M., Streli P., Fender A., Holz C. TapID: Rapid touch interaction in virtual reality using wearable sensing; Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces (VR); Lisboa, Portugal. 27 March–1 April 2021; New York, NY, USA: IEEE; 2021. pp. 519–528.
-
- Guo L., Lu Z., Yao L. Human-machine interaction sensing technology based on hand gesture recognition: A review. IEEE Trans. Hum.-Mach. Syst. 2021;51:300–309. doi: 10.1109/THMS.2021.3086003. - DOI
MeSH terms
LinkOut - more resources
Full Text Sources
