Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 10;23(4):2005.
doi: 10.3390/s23042005.

Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection

Affiliations

Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection

Lawrence Amadi et al. Sensors (Basel). .

Abstract

Weakly supervised pose estimation can be used to assist unsupervised body part segmentation and concealed item detection. The accuracy of pose estimation is essential for precise body part segmentation and accurate concealed item detection. In this paper, we show how poses obtained from an RGB pretrained 2D pose detector can be modified for the backscatter image domain. The 2D poses are refined using RANSAC bundle adjustment to minimize the projection loss in 3D. Furthermore, we show how 2D poses can be optimized using a newly proposed 3D-to-2D pose correction network weakly supervised with pose prior regularizers and multi-view pose and posture consistency losses. The optimized 2D poses are used to segment human body parts. We then train a body-part-aware anomaly detection network to detect foreign (concealed threat) objects on segmented body parts. Our work is applied to the TSA passenger screening dataset containing millimeter wave scan images of airport travelers annotated with only binary labels that indicate whether a foreign object is concealed on a body part. Our proposed approach significantly improves the detection accuracy of TSA 2D backscatter images in existing works with a state-of-the-art performance of 97% F1-score, 0.0559 log-loss on the TSA-PSD test-set, and a 74% reduction in 2D pose error.

Keywords: 2D pose correction; anomaly detection; body part recognition; body segmentation; domain adaptation; object detection; pose refinement; threat localization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
Example of our refined 2D pose (2nd) and unsupervised body segmentation (4th) on MWS images compared to SOTA HRNet 2D pose (1st) [2] and DensePose body segmentation (3rd) [1].
Figure 2
Figure 2
The 17 body zones of interest as indicated by the TSA Passenger Screening Dataset.
Figure 3
Figure 3
An example .aps scan with 16 frames and corresponding binary GT labels for the 17 body zones that indicate the presence of a concealed item in the zone or body part. Notice the threat objects on the right lower thigh (zone 11) and left calf (zone 14) visible in frames 10–16 and 7–10, respectively.
Figure 4
Figure 4
From left to right. Examples of image renderings of .a3d, .a3daps (frame 4 of 64), and .aps (frame 2 of 16) TSA dataset file formats of the same subject.
Figure 5
Figure 5
Our keypoint selection algorithm applied to the right wrist. (A) depicts the right wrist confidence map. (B) is the histogram of the confidence map and the red lines show the multi-Otsu thresholds used to segment (A) to blue, orange and brown layers in (C). (D) shows our algorithm selects the correct right wrist position (green circle) instead of the most confident location (red circle).
Figure 6
Figure 6
Visualization of intermediate stages of limb segmentation for the right elbow. The black line in (A) links the right elbow and shoulder keypoints. The frame is rotated to vertically align the pillar keypoints in (B). (C) shows the computed pixel intensity curve (red line) and fitted polynomial (white) line. (D) shows the estimated bounding polygon. (F) are examples of (shifted, zoomed, and rotated) cropped image augmentation (overlaid with the ROI mask) generated from (E).
Figure 7
Figure 7
Visualization of torso segmentation for the right abdomen. Vertices of the black quadrilateral in (A) are the pelvis, neck, right shoulder, and hip keypoints. The top edge of the black quadrilateral in (B) is shifted downwards, resulting in the blue polygon. The green polygon in (C) captures the ROI of the right abdomen. (E) shows examples of (shifted, zoomed, and rotated) randomly generated cropped image augmentation from (D), overlaid with the ROI mask.
Figure 8
Figure 8
An instance of RaadNet, our two-phase, dual pipeline (indicated by the blue and red lines) anomaly detection network that takes as input cropped images of a body part, their ROI masks, and RCVs, and outputs the probability that a concealed item is in either of the cropped images. We use n = 12 cropped images per body part. h and w = 80 are the height and width of the cropped images. *h, *w = 10, p = 3, m = 4. Each sub-sequence of images p is passed through the same sub-network (enclosed in the large rectangle). Residual CNN blocks (in dark blue) contain two 3D convolution layers with kernel = 3 and f filters. Convolutions are accompanied by batch normalization and Re-LU activation. The fully connected (FCN) block (in light blue) contains five dense layers of sizes 128, 64, 64, 16, and 1 followed by a sigmoid activation. ⊗ and ⊕ are element-wise multiplication and addition operations. Notice that a deeper network can be created by increasing the number of phases.
Figure 9
Figure 9
An illustration (with one training example) of our weakly supervised scheme for encoding a more correct 3D pose given sub-optimal 2D poses. Each training sample contains m types of 2D pose inputs. q0 is the primary sequence of 2D poses for which we optimize the pose prior regularizers and reprojected 2D loss. qi (i0) are matching sequences of 2D poses of the same scan taken from other viewpoints. These 2D poses are fed to the weakly supervised pipeline to estimate their 3D poses (p0*,p1*,p(m1)*) and the 3D-to-2D scale factor of the primary sequence (s0). The encoded 3D pose, p0*, and the corresponding scale factor are combined to project a more correct 2D pose and minimize the reprojected 2D loss. The reprojected 2D pose is the output of the network. Pose prior regularizers [5] are also enforced on p0*. In addition, we minimize the multi-view pose and posture losses between p0* and p1*p(m1)* [4]. This weakly supervised framework is used to retrain VPose [46] without 2D or 3D pose ground-truth annotations. In our experiments, we set m=4.
Figure 10
Figure 10
Top-ranked TSA-PSD algorithms on the Kaggle Leaderboard. (★) indicates algorithms reported to have used 3D image files. Our ensemble RaadNet ranks 7th, placing in the proprietary category. This makes our method the only published, comprehensive work that ranks in the top eight algorithms for anomaly detection on human body parts in the TSA dataset.

References

    1. Güler R.A., Trigeorgis G., Antonakos E., Snape P., Zafeiriou S., Kokkinos I. DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild; In Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Honolulu, HI, USA. 21–26 July 2017; pp. 2614–2623.
    1. Sun K., Xiao B., Liu D., Wang J. Deep High-Resolution Representation Learning for Human Pose Estimation. arXiv. 20191902.09212
    1. Amadi L., Agam G. 2D-Pose Based Human Body Segmentation for Weakly-Supervised Concealed Object Detection in Backscatter Millimeter-Wave Images; Proceedings of the 26th International Conference of Pattern Recognition Systems (T-CAP @ ICPR 2022); Montreal, QC, Canada. 21–25 August 2022.
    1. Amadi L., Agam G. Multi-view Posture Analysis for Semi-Supervised 3D Monocular Pose Estimation; Proceedings of the CVPR; Vancouver, BC, Canada. 18–22 June 2023.
    1. Amadi L., Agam G. Boosting the Performance of Weakly-Supervised 3D Human Pose Estimators with Pose Prior Regularizers; Proceedings of the ICIP; Bordeaux, France. 16–19 October 2022.