Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb;46(2):1305-1318.
doi: 10.1109/TPAMI.2023.3334948. Epub 2024 Jan 8.

Inequality-Constrained 3D Morphable Face Model Fitting

Inequality-Constrained 3D Morphable Face Model Fitting

Evangelos Sariyanidi et al. IEEE Trans Pattern Anal Mach Intell. 2024 Feb.

Abstract

3D morphable model (3DMM) fitting on 2D data is traditionally done via unconstrained optimization with regularization terms to ensure that the result is a plausible face shape and is consistent with a set of 2D landmarks. This paper presents inequality-constrained 3DMM fitting as the first alternative to regularization in optimization-based 3DMM fitting. Inequality constraints on the 3DMM's shape coefficients ensure face-like shapes without modifying the objective function for smoothness, thus allowing for more flexibility to capture person-specific shape details. Moreover, inequality constraints on landmarks increase robustness in a way that does not require per-image tuning. We show that the proposed method stands out with its ability to estimate person-specific face shapes by jointly fitting a 3DMM to multiple frames of a person. Further, when used with a robust objective function, namely gradient correlation, the method can work "in-the-wild" even with a 3DMM constructed from controlled data. Lastly, we show how to use the log-barrier method to efficiently implement the method. To our knowledge, we present the first 3DMM fitting framework that requires no learning yet is accurate, robust, and efficient. The absence of learning enables a generic solution that allows flexibility in the input image size, interchangeable morphable models, and incorporation of camera matrix.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:
(a) Inequality constraints on the 3DMM shape coefficients. Each row depicts how the face shape changes with one component of the 3DMM’s shape basis. The produced shape becomes implausible as the magnitude of a basis coefficient becomes too large. Our inequality constraints enforce basis coefficients to stay in the feasible (i.e., green shaded) region. (b) Inequality constraints for landmarks. For each landmark, we learn a feasible region (i.e., rectangle with edges ϵx, ϵy) and prohibit solutions where the landmark falls outside this region.
Fig. 2:
Fig. 2:
How 3D reconstruction consistency changes with number of frames used. Each column shows the average of 20 3D shape renderings for 2 subjects. When reconstruction is done with F = 1 frame, there is inconsistency between different reconstructions of the same person and the average is blurry. Reconstructions become more consistent and person-specific as F increases.
Fig. 3:
Fig. 3:
Inequality constraints in action. Each row shows the objective function for 3DMM fitting; the function is reduced to the line containing the unconstrained and inequality-constrained solutions, θu and θ3DI. Green rectangles show the feasible region that satisfies inequalities. Top row: inequality constraints prevent convergence to the local minimum with a shape that is too intricate. Bottom row: the local minimum θu that leads to a poor shape is ruled out by inequality constraints.
Fig. 4:
Fig. 4:
Distributions of the spatial (i.e., horizontal or vertical) discrepancy between the estimated and actual location of landmarks (lower lip, nose, left eye and right eye), computed on a set of synthesized images. The numbers in each panel are the kurtosis (k) and skewness (s) coefficients of the corresponding distribution.
Fig. 5:
Fig. 5:
Log-barrier method. Solid lines depict the bounds of an inequality constraint; the intersection of all constraints is the polygon. Dashed curves show how the method approximates the polygon using the logarithm of inequality constraints [see (5)]. Approximation quality improves as t increases, t0,t1,,tT. Thus, the log-barrier method transforms an inequality-constrained problem into a series of unconstrained problems, the solution of which (i.e., θˆ* ) gradually approaches that of the original problem, θ*.
Fig. 6:
Fig. 6:
Illustration of how the neighbors of a pixel at (x,y) are located within the image vector I that contains a non-rectangular region. M[i] is a binary 2D array that indicates rendered pixels, and C[i] is its cumulative sum; both arrays follow a column-major representation. Suppose that the pixel at (x,y) in the 2D image space corresponds to the kth entry in I, where k=C[i]=C[y+(x-1)H]. The pixels to the left and right of (x,y) are located at the entries kl=C[i-H] and kr=C[i+H] of I.
Fig. 7:
Fig. 7:
Cumulative Error Distribution (CED) for compared methods, showing the percentage of subjects whose (mean) face reconstruction error is below the listed threshold values. Numbers in legend indicate each method’s median reconstruction on the dataset.
Fig. 8:
Fig. 8:
Average geometric error against the number of frames used per reconstruction, shown on three datasets.
Fig. 9:
Fig. 9:
Qualitative results depicting 3D shape identity for compared methods.
Fig. 10:
Fig. 10:
(a) Within- and between-subject effect size (WBES) against number of frames per reconstruction for the BU4DFE dataset; higher WBES indicates better capacity to distinguish between subjects. (b) Distributions of within- and between-subject distances for best methods in (a), namely Deep3DFace and Open3DI, for various numbers of frames per reconstruction, F.
Fig. 11:
Fig. 11:
(a) Within- and between-subject effect size (WBES) for YT Faces; higher WBES is better. (b) Some faces in the dataset are unnaturaly squeezed by some distortion.
Fig. 12:
Fig. 12:
Qualitative results on the YT Faces dataset, depicting the input frame, reconstructed face and estimated identity shape. Further qualitative results are in Supp. Fig. G.5
Fig. 13:
Fig. 13:
Cumulative error distribution (CED) of Open3DI for 2D landmark detection compared to widely used landmark detection methods.
Fig. 14:
Fig. 14:
Open3DI’s performance on the AFLW3D-2000 dataset. For each test image, we show the landmark points as estimated by Open3DI, the dense mesh on the input image, and estimated (neutral) identity shape.
Fig. 15:
Fig. 15:
(a) The landmark detection error of our 2D-FAN implementation, measured on a synthesized dataset in terms of median NME (normalized mean error) vs. pose (i.e., yaw angle) variation. (b) Median (normalized) dense 3D reconstruction error vs. pose variation for Open3DI vs. GC-L2; normalization is performed by dividing to inter-ocular distance.
Fig. 16:
Fig. 16:
(a) Processing time vs. number of frames per reconstruction for Open3DI and INORig on an NVIDIA GTX 1080 GPU. (b) Processing time (per iteration) for the first-order approximation of Hessian needed by Open3DI.
Fig. 17:
Fig. 17:
Limitations of Open3DI illustrated over reconstructions of two (synthesized) subjects. While Open3DI captures the facial features (eyes, brows, mouth) relatively well, regions with low texture variation (cheeks, jaw) are captured with visibly less accuracy.

References

    1. Bagdanov AD, Del Bimbo A, and Masi I, “The florence 2d/3d hybrid face dataset,” in ACM Workshop on Human Gesture and Behavior Understanding, 2011, pp. 79–80.
    1. Bai Z, Cui Z, Liu X, and Tan P, “Riggable 3d face reconstruction via in-network optimization,” in CVPR, 2021, pp. 6216–6225.
    1. Baker S and Matthews I, “Lucas-Kanade 20 years on: A unifying framework,” IJCV, vol. 56, no. 3, pp. 221–255, 2004.
    1. Bas A, Smith WA, Bolkart T, and Wuhrer S, “Fitting a 3D morphable model to edges: A comparison between hard and soft correspondences,” in ACCV, 2016, pp. 377–391.
    1. Blanz V and Vetter T, “A morphable model for the synthesis of 3D faces,” in Proceedings of the Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., 1999, pp. 187–194.