Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study
- PMID: 38477981
- PMCID: PMC10973953
- DOI: 10.2196/42904
Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study
Abstract
Background: While characteristic facial features provide important clues for finding the correct diagnosis in genetic syndromes, valid assessment can be challenging. The next-generation phenotyping algorithm DeepGestalt analyzes patient images and provides syndrome suggestions. GestaltMatcher matches patient images with similar facial features. The new D-Score provides a score for the degree of facial dysmorphism.
Objective: We aimed to test state-of-the-art facial phenotyping tools by benchmarking GestaltMatcher and D-Score and comparing them to DeepGestalt.
Methods: Using a retrospective sample of 4796 images of patients with 486 different genetic syndromes (London Medical Database, GestaltMatcher Database, and literature images) and 323 inconspicuous control images, we determined the clinical use of D-Score, GestaltMatcher, and DeepGestalt, evaluating sensitivity; specificity; accuracy; the number of supported diagnoses; and potential biases such as age, sex, and ethnicity.
Results: DeepGestalt suggested 340 distinct syndromes and GestaltMatcher suggested 1128 syndromes. The top-30 sensitivity was higher for DeepGestalt (88%, SD 18%) than for GestaltMatcher (76%, SD 26%). DeepGestalt generally assigned lower scores but provided higher scores for patient images than for inconspicuous control images, thus allowing the 2 cohorts to be separated with an area under the receiver operating characteristic curve (AUROC) of 0.73. GestaltMatcher could not separate the 2 classes (AUROC 0.55). Trained for this purpose, D-Score achieved the highest discriminatory power (AUROC 0.86). D-Score's levels increased with the age of the depicted individuals. Male individuals yielded higher D-scores than female individuals. Ethnicity did not appear to influence D-scores.
Conclusions: If used with caution, algorithms such as D-score could help clinicians with constrained resources or limited experience in syndromology to decide whether a patient needs further genetic evaluation. Algorithms such as DeepGestalt could support diagnosing rather common genetic syndromes with facial abnormalities, whereas algorithms such as GestaltMatcher could suggest rare diagnoses that are unknown to the clinician in patients with a characteristic, dysmorphic face.
Keywords: D-Score; DeepGestalt; Face2Gene; GestaltMatcher; diagnostic accuracy; facial phenotyping; facial recognition; genetic syndrome; genetics; machine learning; medical genetics.
©Alisa Maria Vittoria Reiter, Jean Tori Pantel, Magdalena Danyel, Denise Horn, Claus-Eric Ott, Martin Atta Mensah. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 13.03.2024.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures



Similar articles
-
An Artificial Intelligence Approach to the Craniofacial Recapitulation of Crisponi/Cold-Induced Sweating Syndrome 1 (CISS1/CISS) from Newborns to Adolescent Patients.Diagnostics (Basel). 2025 Feb 21;15(5):521. doi: 10.3390/diagnostics15050521. Diagnostics (Basel). 2025. PMID: 40075769 Free PMC article.
-
Efficiency of Computer-Aided Facial Phenotyping (DeepGestalt) in Individuals With and Without a Genetic Syndrome: Diagnostic Accuracy Study.J Med Internet Res. 2020 Oct 22;22(10):e19263. doi: 10.2196/19263. J Med Internet Res. 2020. PMID: 33090109 Free PMC article.
-
Comparison of the Accuracy in Provisional Diagnosis of 22q11.2 Deletion and Williams Syndromes by Facial Photos in Thai Population Between De-Identified Facial Program and Clinicians.Appl Clin Genet. 2024 Jul 4;17:107-115. doi: 10.2147/TACG.S458400. eCollection 2024. Appl Clin Genet. 2024. PMID: 38983678 Free PMC article.
-
Computational facial analysis for rare Mendelian disorders.Am J Med Genet C Semin Med Genet. 2023 Sep;193(3):e32061. doi: 10.1002/ajmg.c.32061. Epub 2023 Aug 16. Am J Med Genet C Semin Med Genet. 2023. PMID: 37584245 Review.
-
Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis.J Med Internet Res. 2021 Jul 3;23(7):e23863. doi: 10.2196/23863. J Med Internet Res. 2021. PMID: 34407500 Free PMC article. Review.
Cited by
-
AI-Based Facial Phenotyping Supports a Shared Molecular Axis in PACS1-, PACS2-, and WDR37-Related Syndromes.Int J Mol Sci. 2025 Aug 18;26(16):7964. doi: 10.3390/ijms26167964. Int J Mol Sci. 2025. PMID: 40869285 Free PMC article.
-
Obstacles to Early Diagnosis of Gaucher Disease.Ther Clin Risk Manag. 2025 Jan 25;21:93-101. doi: 10.2147/TCRM.S388266. eCollection 2025. Ther Clin Risk Manag. 2025. PMID: 39882275 Free PMC article. Review.
-
Workflow analysis and evaluation of a next-generation phenotyping tool: A qualitative study of Face2Gene.Eur J Hum Genet. 2025 May 23. doi: 10.1038/s41431-025-01875-0. Online ahead of print. Eur J Hum Genet. 2025. PMID: 40410386
-
An Artificial Intelligence Approach to the Craniofacial Recapitulation of Crisponi/Cold-Induced Sweating Syndrome 1 (CISS1/CISS) from Newborns to Adolescent Patients.Diagnostics (Basel). 2025 Feb 21;15(5):521. doi: 10.3390/diagnostics15050521. Diagnostics (Basel). 2025. PMID: 40075769 Free PMC article.
References
-
- Jayaratne YSN, Zwahlen RA. Application of digital anthropometry for craniofacial assessment. Craniomaxillofac Trauma Reconstr. 2014;7(2):101–107. doi: 10.1055/s-0034-1371540. https://europepmc.org/abstract/MED/25050146 130264rev - DOI - PMC - PubMed
-
- Lumaka A, Cosemans N, Mampasi AL, Mubungu G, Mvuama N, Lubala T, Mbuyi-Musanzayi S, Breckpot J, Holvoet M, de Ravel T, Van Buggenhout G, Peeters H, Donnai D, Mutesa L, Verloes A, Tshilobo PL, Devriendt K. Facial dysmorphism is influenced by ethnic background of the patient and of the evaluator. Clin Genet. 2017;92(2):166–171. doi: 10.1111/cge.12948. - DOI - PubMed
-
- Boehringer S, Vollmar T, Tasse C, Wurtz RP, Gillessen-Kaesbach G, Horsthemke B, Wieczorek D. Syndrome identification based on 2D analysis software. Eur J Hum Genet. 2006;14(10):1082–1089. doi: 10.1038/sj.ejhg.5201673. https://www.nature.com/articles/5201673 5201673 - DOI - PubMed
-
- Vollmar T, Maus B, Wurtz RP, Gillessen-Kaesbach G, Horsthemke B, Wieczorek D, Boehringer S. Impact of geometry and viewing angle on classification accuracy of 2D based analysis of dysmorphic faces. Eur J Med Genet. 2008;51(1):44–53. doi: 10.1016/j.ejmg.2007.10.002.S1769-7212(07)00104-8 - DOI - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources