Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 13:26:e42904.
doi: 10.2196/42904.

Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study

Affiliations

Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study

Alisa Maria Vittoria Reiter et al. J Med Internet Res. .

Abstract

Background: While characteristic facial features provide important clues for finding the correct diagnosis in genetic syndromes, valid assessment can be challenging. The next-generation phenotyping algorithm DeepGestalt analyzes patient images and provides syndrome suggestions. GestaltMatcher matches patient images with similar facial features. The new D-Score provides a score for the degree of facial dysmorphism.

Objective: We aimed to test state-of-the-art facial phenotyping tools by benchmarking GestaltMatcher and D-Score and comparing them to DeepGestalt.

Methods: Using a retrospective sample of 4796 images of patients with 486 different genetic syndromes (London Medical Database, GestaltMatcher Database, and literature images) and 323 inconspicuous control images, we determined the clinical use of D-Score, GestaltMatcher, and DeepGestalt, evaluating sensitivity; specificity; accuracy; the number of supported diagnoses; and potential biases such as age, sex, and ethnicity.

Results: DeepGestalt suggested 340 distinct syndromes and GestaltMatcher suggested 1128 syndromes. The top-30 sensitivity was higher for DeepGestalt (88%, SD 18%) than for GestaltMatcher (76%, SD 26%). DeepGestalt generally assigned lower scores but provided higher scores for patient images than for inconspicuous control images, thus allowing the 2 cohorts to be separated with an area under the receiver operating characteristic curve (AUROC) of 0.73. GestaltMatcher could not separate the 2 classes (AUROC 0.55). Trained for this purpose, D-Score achieved the highest discriminatory power (AUROC 0.86). D-Score's levels increased with the age of the depicted individuals. Male individuals yielded higher D-scores than female individuals. Ethnicity did not appear to influence D-scores.

Conclusions: If used with caution, algorithms such as D-score could help clinicians with constrained resources or limited experience in syndromology to decide whether a patient needs further genetic evaluation. Algorithms such as DeepGestalt could support diagnosing rather common genetic syndromes with facial abnormalities, whereas algorithms such as GestaltMatcher could suggest rare diagnoses that are unknown to the clinician in patients with a characteristic, dysmorphic face.

Keywords: D-Score; DeepGestalt; Face2Gene; GestaltMatcher; diagnostic accuracy; facial phenotyping; facial recognition; genetic syndrome; genetics; machine learning; medical genetics.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Workflow of analyses. GMDB: GestaltMatcher Database; LMD: London Medical Database. *Binary classifier uses only 1 score.
Figure 2
Figure 2
Comparison of DeepGestalt and GestaltMatcher: (A) Venn diagram showing the number of syndromes supported by DeepGestalt (purple) and GestaltMatcher (grey). (B) Histogramme of top-30 scores yielded by DeepGestalt and GestaltMatcher in patient and control images. Sensitivities of (C) DeepGestalt and (D) GestaltMatcher for syndromes featuring at least 5 analyzed images among all patients included images with mean sensitivity averaged by syndrome. Linear regressions of false positive rates of (E) DeepGestalt’s and (F) GestaltMatcher’s top-30 results in matched control (y-axis) and patient (x-axis) images of the literature data set. (High resolution image available in Multimedia Appendix 2). FPR: false positive rate; h: healthy control; s: patient with syndrome.
Figure 3
Figure 3
Accuracy of DeepGestalt, GestaltMatcher and D-Score. (A) Distributions of the scores yielded by D-Score (D, turquoise), GestaltMatcher (GM, grey), and DeepGestalt (DG, purple), in the different test sets. (B) Receiver operating characteristic curves (ROC) of D-Score (D, turquoise), GestaltMatcher (GM, grey), and DeepGestalt (DG, purple). (High resolution version available in Multimedia Appendix 6). AUC: area under the curve; GMDB: GestaltMatcher Database; LMD: London Medical Database; ROC: receiver operating characteristic.

Similar articles

Cited by

References

    1. Jayaratne YSN, Zwahlen RA. Application of digital anthropometry for craniofacial assessment. Craniomaxillofac Trauma Reconstr. 2014;7(2):101–107. doi: 10.1055/s-0034-1371540. https://europepmc.org/abstract/MED/25050146 130264rev - DOI - PMC - PubMed
    1. Lumaka A, Cosemans N, Mampasi AL, Mubungu G, Mvuama N, Lubala T, Mbuyi-Musanzayi S, Breckpot J, Holvoet M, de Ravel T, Van Buggenhout G, Peeters H, Donnai D, Mutesa L, Verloes A, Tshilobo PL, Devriendt K. Facial dysmorphism is influenced by ethnic background of the patient and of the evaluator. Clin Genet. 2017;92(2):166–171. doi: 10.1111/cge.12948. - DOI - PubMed
    1. Ferreira CR. The burden of rare diseases. Am J Med Genet A. 2019;179(6):885–892. doi: 10.1002/ajmg.a.61124. - DOI - PubMed
    1. Boehringer S, Vollmar T, Tasse C, Wurtz RP, Gillessen-Kaesbach G, Horsthemke B, Wieczorek D. Syndrome identification based on 2D analysis software. Eur J Hum Genet. 2006;14(10):1082–1089. doi: 10.1038/sj.ejhg.5201673. https://www.nature.com/articles/5201673 5201673 - DOI - PubMed
    1. Vollmar T, Maus B, Wurtz RP, Gillessen-Kaesbach G, Horsthemke B, Wieczorek D, Boehringer S. Impact of geometry and viewing angle on classification accuracy of 2D based analysis of dysmorphic faces. Eur J Med Genet. 2008;51(1):44–53. doi: 10.1016/j.ejmg.2007.10.002.S1769-7212(07)00104-8 - DOI - PubMed

Publication types

LinkOut - more resources