Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 25;22(3):925.
doi: 10.3390/s22030925.

sSfS: Segmented Shape from Silhouette Reconstruction of the Human Body

Affiliations

sSfS: Segmented Shape from Silhouette Reconstruction of the Human Body

Wiktor Krajnik et al. Sensors (Basel). .

Abstract

Three-dimensional (3D) shape estimation of the human body has a growing number of applications in medicine, anthropometry, special effects, and many other fields. Therefore, the demand for the high-quality acquisition of a complete and accurate body model is increasing. In this paper, a short survey of current state-of-the-art solutions is provided. One of the most commonly used approaches is the Shape-from-Silhouette (SfS) method. It is capable of the reconstruction of dynamic and challenging-to-capture objects. This paper proposes a novel approach that extends the conventional voxel-based SfS method with silhouette segmentation-segmented Shape from Silhouette (sSfS). It allows the 3D reconstruction of body segments separately, which provides significantly better human body shape estimation results, especially in concave areas. For validation, a dataset representing the human body in 20 complex poses was created and assessed based on the quality metrics in reference to the ground-truth photogrammetric reconstruction. It appeared that the number of invalid reconstruction voxels for the sSfS method was 1.7 times lower than for the state-of-the-art SfS approach. The root-mean-square (RMS) error of the distance to the reference surface was also 1.22 times lower.

Keywords: 3D reconstruction; Shape from Silhouette; computer vision; human body segmentation; multi-view images; pose estimation; visual hull; volumetric methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure A1
Figure A1
Visualization of the SfS/sSfS reconstructions for data samples #7–#20.
Figure A2
Figure A2
Visualization of the P2S distances for SfS/sSfS reconstructions for data samples #7–#20.
Figure 1
Figure 1
Visualized measurement scene with 34 camera distribution and sample-measured subject SfM point cloud.
Figure 2
Figure 2
Steps involved in the conventional SfS reconstruction method of an object: (a) the volume of the whole reconstruction system, estimated with the cameras’ positions; (b) input RGB images; (c) silhouette images of the subject; (d) approximate subject volume from the transition step (see Figure 3 for details); (e) final visual hull estimation with the target voxel size.
Figure 3
Figure 3
Estimation of the subject’s volume: (a) the initial voxel grid with voxel size D0 in the volume of the system cameras; (b) the voxel grid with voxel size D1 of the subject’s volume.
Figure 4
Figure 4
Example of Omnimatte [61] silhouette estimation performance with voxel projections onto the image (data sample #9): (a) input RGB image; (b) silhouette image with magnified silhouette quality values on its edge; (c) voxel center projections onto the silhouette. The projections in the image are marked with pixel-sized dots. The blue pixels represent the silhouette, and the red represent the background.
Figure 5
Figure 5
Proposed sSfS reconstruction approach flowchart. The steps added to the conventional SfS approach are highlighted in pink: (a) conventional visual hull estimation with silhouette images of an entire subject’s body (see Figure 3 for details); (b) results of the estimation of human joint positions on the 2D color images with a CNN-based Human Pose pre-trained model [65]; (c) retrieval of the 3D joint positions by casting the rays leading from each camera center’s 3D position to each joint and calculating the best intersections; (d) segmentation of the subject’s coarse 3D visual hull voxel reconstruction; (e) silhouette image segmentation by projecting the segment points onto the silhouettes (see Figure 6 for details); (f) estimation of the 3D volume of each body segment; (g) SfS reconstruction results for each body part separately and merged as a whole sSfS body model.
Figure 6
Figure 6
The estimation of the body segment silhouettes and their volumetric reconstructions. (a) Segment from the initial coarse volumetric reconstruction step with a 4 mm voxel size. (b) Projection of the voxels from the segment reconstruction onto the camera images for all the system’s cameras. The pixels where the projection was performed are changed to a value of 255. (c) Input silhouettes of the subject’s entire body from a set used to estimate an initial visual hull reconstruction. (d) Body segment detection for all silhouette images. The projection from step (b) is used as a mask for the segment on the silhouette image. (e) Detection of erroneous silhouettes by calculating the ratio of the segment projection pixels to the uncertain silhouette pixels, with values in the range <1, 254>.
Figure 7
Figure 7
Surface reconstruction results comparison for selected dataset samples; every column ID number corresponds to a data sample from Table A1: (a) SfM; (b) SfS; (c) sSfS.
Figure 8
Figure 8
The erroneous voxel counts for SfS and sSfS reconstructions of the validation dataset.
Figure 9
Figure 9
Visualization of P2S distances for SfS and sSfS results: (a) SfM; (b) SfS with P2S error map; (c) sSfS with P2S error map. Each data sample in the column corresponds to a data sample from Table A1; the numbers at the top of each column relate to the data sample ID.
Figure 10
Figure 10
P2S error histograms: (a) overlapping SfS and sSfS histograms for dataset sample #6. The x-axis represents the histogram bin ranges with a bin of 1 mm. The last bin contains the sum of all samples with a distance >40 mm; (b) RMS of the P2S distances for SfS and sSfS reconstructions for the entire validation dataset.
Figure 11
Figure 11
Comparison of the results of three different SfS reconstruction approaches and sSfS for data sample #9: (a) SfS for a high voting threshold for all views; (b) SfS for a high voting threshold for selected views; (c) SfS for a low voting threshold for all views; (d) sSfS.
Figure 12
Figure 12
Examples of erroneous silhouette areas: (a) torso, left leg, and head; (b) feet.

References

    1. Gipsman A., Rauschert L., Daneshvar M., Knott P. Evaluating the Reproducibility of Motion Analysis Scanning of the Spine during Walking. Adv. Med. 2014;2014:721829. doi: 10.1155/2014/721829. - DOI - PMC - PubMed
    1. Betsch M., Wild M., Johnstone B., Jungbluth P., Hakimi M., Kühlmann B., Rapp W. Evaluation of a Novel Spine and Surface Topography System for Dynamic Spinal Curvature Analysis during Gait. PLoS ONE. 2013;8:e70581. doi: 10.1371/journal.pone.0070581. - DOI - PMC - PubMed
    1. Pons-Moll G., Romero J., Mahmood N., Black M.J. Dyna: A model of dynamic human shape in motion. ACM Trans. Graph. 2015;34:1–14. doi: 10.1145/2766993. - DOI
    1. Zhang C., Pujades S., Black M., Pons-Moll G. Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA. 21–26 July 2017.
    1. [(accessed on 1 December 2021)]. Available online: https://www.microsoft.com/en-us/mixed-reality/capture-studios.

LinkOut - more resources