SOCA-PRNet: Spatially Oriented Attention-Infused Structured-Feature-Enabled PoseResNet for 2D Human Pose Estimation

Ali Zakir¹, Sartaj Ahmed Salman¹, Hiroki Takahashi^{1

2}

Affiliations

¹ Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.
² Artificial Intelligence Exploration Research Center/Meta-Networking Research Center, The University of Electro-Communications, Tokyo 182-8585, Japan.

PMID: 38202972
PMCID: PMC10780779
DOI: 10.3390/s24010110

SOCA-PRNet: Spatially Oriented Attention-Infused Structured-Feature-Enabled PoseResNet for 2D Human Pose Estimation

Ali Zakir et al. Sensors (Basel). 2023.

. 2023 Dec 25;24(1):110.

doi: 10.3390/s24010110.

Authors

Ali Zakir¹, Sartaj Ahmed Salman¹, Hiroki Takahashi^{1

2}

Affiliations

¹ Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.
² Artificial Intelligence Exploration Research Center/Meta-Networking Research Center, The University of Electro-Communications, Tokyo 182-8585, Japan.

PMID: 38202972
PMCID: PMC10780779
DOI: 10.3390/s24010110

Abstract

In the recent era, 2D human pose estimation (HPE) has become an integral part of advanced computer vision (CV) applications, particularly in understanding human behaviors. Despite challenges such as occlusion, unfavorable lighting, and motion blur, advancements in deep learning have significantly enhanced the performance of 2D HPE by enabling automatic feature learning from data and improving model generalization. Given the crucial role of 2D HPE in accurately identifying and classifying human body joints, optimization is imperative. In response, we introduce the Spatially Oriented Attention-Infused Structured-Feature-enabled PoseResNet (SOCA-PRNet) for enhanced 2D HPE. This model incorporates a novel element, Spatially Oriented Attention (SOCA), designed to enhance accuracy without significantly increasing the parameter count. Leveraging the strength of ResNet34 and integrating Global Context Blocks (GCBs), SOCA-PRNet precisely captures detailed human poses. Empirical evaluations demonstrate that our model outperforms existing state-of-the-art approaches, achieving a Percentage of Correct Keypoints at 0.5 (PCKh@0.5) of 90.877 at a 50% threshold and a Mean Precision (Mean@0.1) score of 41.137. These results underscore the potential of SOCA-PRNet in real-world applications such as robotics, gaming, and human-computer interaction, where precise and efficient 2D HPE is paramount.

Keywords: 2D human pose estimation; CV; Global Context Blocks; SOCA-PRNet.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Detailed architecture of our proposed SOCA-PRNet model for 2D HPE.

**Figure 2**
Detailed architecture of the simple base line model for 2D HPE [6].

**Figure 3**
Modified ResNet with deconvolution module.

**Figure 4**
Visualization of the SOCA module’s feature integration and weights W generation and distribution mechanisms.

**Figure 5**
Comparative visualization of HardSwish and ReLU activation functions.

**Figure 6**
Visual analysis of 2D HPE models in terms of accuracy and parameter count.

**Figure 7**
Graphical Illustration of the proposed model and simple baseline models. (a) Illustration of PCKh@0.5 results: proposed model and simple baseline models. (b) Graphical analysis of Mean and Mean@0.1: proposed models and simple baseline models.

**Figure 8**
Qualitative results on MPII pose estimation result, containing viewpoint change, occlusion, and self-occlusion.

See this image and copyright information in PMC

References

1. Bertasius G., Feichtenhofer C., Tran D., Shi J., Torresani L. Learning temporal pose estimation from sparsely-labeled videos. Adv. Neural Inf. Process. Syst. 2019;32
1. Chen H., Feng R., Wu S., Xu H., Zhou F., Liu Z. 2D Human pose estimation: A survey. Multimed. Syst. 2023;29:3115–3138. doi: 10.1007/s00530-022-01019-0. - DOI
1. Sapp B., Toshev A., Taskar B. Cascaded models for articulated pose estimation; Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision; Heraklion, Crete, Greece. 5–11 September 2010; Berlin/Heidelberg, Germany: Springer; 2010. pp. 406–420.
1. Wang F., Li Y. Beyond physical connections: Tree models in human pose estimation; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Portland, OR, USA. 23–28 June 2013; pp. 596–603.
1. Cao Z., Simon T., Wei S.E., Sheikh Y. Realtime multi-person 2d pose estimation using part affinity fields; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA. 21–26 July 2017; pp. 7291–7299.

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SOCA-PRNet: Spatially Oriented Attention-Infused Structured-Feature-Enabled PoseResNet for 2D Human Pose Estimation

Affiliations

SOCA-PRNet: Spatially Oriented Attention-Infused Structured-Feature-Enabled PoseResNet for 2D Human Pose Estimation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources