Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 10:rs.3.rs-4438861.
doi: 10.21203/rs.3.rs-4438861/v1.

GestaltMatcher Database - A global reference for facial phenotypic variability in rare human diseases

Hellen Lesmann  1   2 Alexander Hustinx  2 Shahida Moosa  3 Hannah Klinkhammer  2   4 Elaine Marchi  5 Pilar Caro  6 Ibrahim M Abdelrazek  7 Jean Tori Pantel  8   9 Merle Ten Hagen  2 Meow-Keong Thong  10 Rifhan Azwani Binti Mazlan  10 Sok Kun Tae  10 Tom Kamphans  11 Wolfgang Meiswinkel  11 Jing-Mei Li  2 Behnam Javanmardi  2 Alexej Knaus  2 Annette Uwineza  12 Cordula Knopp  13 Tinatin Tkemaladze  14   15 Miriam Elbracht  13 Larissa Mattern  13 Rami Abou Jamra  16 Clara Velmans  17 Vincent Strehlow  16 Maureen Jacob  18 Angela Peron  19   20 Cristina Dias  21   22   23   24 Beatriz Carvalho Nunes  25 Thainá Vilella  25 Isabel Furquim Pinheiro  26 Chong Ae Kim  26 Maria Isabel Melaragno  25 Hannah Weiland  2 Sophia Kaptain  2 Karolina Chwiałkowska  27   28 Miroslaw Kwasniewski  28   27 Ramy Saad  22   29 Sarah Wiethoff  30 Himanshu Goel  31 Clara Tang  32 Anna Hau  33 Tahsin Stefan Barakat  34 Przemysław Panek  35 Amira Nabil  7 Julia Suh  13 Frederik Braun  36 Israel Gomy  37 Luisa Averdunk  38 Ekanem Ekure  39 Gaber Bergant  40 Borut Peterlin  41 Claudio Graziano  42 Nagwa Gaboon  43   44 Moisés Fiesco-Roa  45   46 Alessandro Mauro Spinelli  47 Nina-Maria Wilpert  48   49   50 Prasit Phowthongkum  51   52 Nergis Güzel  13 Tobias B Haack  53 Rana Bitar  54   55 Andreas Tzschach  56 Agusti Rodriguez-Palmero  57 Theresa Brunet  18 Sabine Rudnik-Schöneborn  58 Silvina Noemi Contreras-Capetillo  59 Ava Oberlack  18 Carole Samango-Sprouse  60   61   62 Teresa Sadeghin  63 Margaret Olaya  63 Konrad Platzer  16 Artem Borovikov  64 Franziska Schnabel  16 Lara Heuft  16 Vera Herrmann  16 Renske Oegema  65 Nour Elkhateeb  66 Sheetal Kumar  1 Katalin Komlosi  56 Khoushoua Mohamed  7 Silvia Kalantari  67 Fabio Sirchia  67   68 Antonio F Martinez-Monseny  69 Matthias Höller  56 Louiza Toutouna  56 Amal Mohamed  7 Amaia Lasa-Aranzasti  70   71 John A Sayer  72   73 Nadja Ehmke  74 Magdalena Danyel  74 Henrike Sczakiel  74 Sarina Schwartzmann  74 Felix Boschann  74 Max Zhao  74 Ronja Adam  74 Lara Einicke  74 Denise Horn  74 Kee Seang Chew  75 Choy Chen Kam  75 Miray Karakoyun  76 Ben Pode-Shakked  77   78 Aviva Eliyahu  79   80   81 Rachel Rock  82   83 Teresa Carrion  84 Odelia Chorin  85 Yuri A Zarate  86   87 Marcelo Martinez Conti  88 Mert Karakaya  17 Moon Ley Tung  89   90 Bharatendu Chandra  89   90 Arjan Bouman  34 Aime Lumaka  91 Naveed Wasif  92   93 Marwan Shinawi  94 Patrick R Blackburn  95 Tianyun Wang  96   97   98 Tim Niehues  99 Axel Schmidt  1 Regina Rita Roth  100 Dagmar Wieczorek  100 Ping Hu  101 Rebekah L Waikel  101 Suzanna E Ledgister Hanchard  101 Gehad Elmakkawy  7 Sylvia Safwat  7 Frédéric Ebstein  102   103 Elke Krüger  104 Sébastien Küry  102   103 Stéphane Bézieau  102   103 Annabelle Arlt  2 Eric Olinger  105 Felix Marbach  6 Dong Li  106 Lucie Dupuis  107 Roberto Mendoza-Londono  107 Sofia Douzgou Houge  108 Denisa Weis  109 Brian Hon-Yin Chung  110   111 Christopher C Y Mak  111 Hülya Kayserili  112 Nursel Elcioglu  113 Ayca Aykut  114 Peli Özlem Şimşek-Kiper  115 Nina Bögershausen  116 Bernd Wollnik  116   117   118 Heidi Beate Bentzen  119   120 Ingo Kurth  13 Christian Netzer  17 Aleksandra Jezela-Stanek  35 Koen Devriendt  121 Karen W Gripp  122 Martin Mücke  8   9 Alain Verloes  123 Christian P Schaaf  6 Christoffer Nellåker  124 Benjamin D Solomon  101 Markus M Nöthen  1 Ebtesam Abdalla  7 Gholson J Lyon  125   126   127 Peter M Krawitz  2 Tzung-Chien Hsieh  2
Affiliations

GestaltMatcher Database - A global reference for facial phenotypic variability in rare human diseases

Hellen Lesmann et al. Res Sq. .

Abstract

The most important factor that complicates the work of dysmorphologists is the significant phenotypic variability of the human face. Next-Generation Phenotyping (NGP) tools that assist clinicians with recognizing characteristic syndromic patterns are particularly challenged when confronted with patients from populations different from their training data. To that end, we systematically analyzed the impact of genetic ancestry on facial dysmorphism. For that purpose, we established the GestaltMatcher Database (GMDB) as a reference dataset for medical images of patients with rare genetic disorders from around the world. We collected 10,980 frontal facial images - more than a quarter previously unpublished - from 8,346 patients, representing 581 rare disorders. Although the predominant ancestry is still European (67%), data from underrepresented populations have been increased considerably via global collaborations (19% Asian and 7% African). This includes previously unpublished reports for more than 40% of the African patients. The NGP analysis on this diverse dataset revealed characteristic performance differences depending on the composition of training and test sets corresponding to genetic relatedness. For clinical use of NGP, incorporating non-European patients resulted in a profound enhancement of GestaltMatcher performance. The top-5 accuracy rate increased by +11.29%. Importantly, this improvement in delineating the correct disorder from a facial portrait was achieved without decreasing the performance on European patients. By design, GMDB complies with the FAIR principles by rendering the curated medical data findable, accessible, interoperable, and reusable. This means GMDB can also serve as data for training and benchmarking. In summary, our study on facial dysmorphism on a global sample revealed a considerable cross ancestral phenotypic variability confounding NGP that should be counteracted by international efforts for increasing data diversity. GMDB will serve as a vital reference database for clinicians and a transparent training set for advancing NGP technology.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
a) Birth rate distribution worldwide. The size of country is scaled in accordance with the respective birth rate. The map indicates countries from which unpublished images were obtained (source: https://worldmapper.org/faq/, modified). b) Distribution of ancestry groups in GestaltMatcher Database. 16% of the patients without ancestral information were categorized as Unknown. The breakdown of ancestries in the dataset with known ancestry is as follows: European 67%, Asian 19%, African 7%, and Others 7%.
Figure 2:
Figure 2:. GestaltMatcher Database (GMDB) Architecture and Dataflow.
a) Retrospective data are collected from the literature and annotated by data curators or are uploaded by collaborating attending clinician. Patients can also upload images of their own cases, incorporate prospective data, and view their own data at any time. b) The data (multimodal image data, including portrait images as well as magnetic resonance imaging, X-ray, fundscopy and extremity images) are stored in the GMDB (MySQL database) together with the relevant meta information (such as sex, age, ancestry, molecular, and phenotypic information). c) Registered users can view and search the FAIR data in the GMDB Gallery. The patient image can also be analyzed using the Next-Generation Phenotyping tool GestaltMatcher within the Research Platform. In addition, once their application has been approved by the Advisory Board, external computer scientists can use the GMDB-FAIR data set for training purposes for their projects.
Figure 3:
Figure 3:. An example case presentation of a FAIR case with a Digital Object Identifier (DOI).
a) FAIR cases in the GestaltMatcher Database (GMDB) are displayed to GMDB users via the data sheet. Each FAIR case can also be assigned a DOI in order to render it a citable micro-publication. This micro-publication contains the image data and metadata, including demographic, molecular, and phenotype information. The dynamic nature of the GMDB case report enables longitudinal image data storage even after initial publication, which is not possible in conventional journals. b) After uploading, case reports can be viewed and searched by other users in the Gallery view. c) The image data can also be used for inter-cohort comparisons of the gestalt scores within the research platform.
Figure 4:
Figure 4:. Overview of the GestaltMatcher Database (GMDB)-FAIR dataset.
a) Sex distribution. Number of images shown in brackets. b) Distribution of patient age in years. c) Left: Two-dimensional representation of phenotypic similarities between patients, as calculated on the basis of Human Phenotype Ontology (HPO) terms via Uniform Manifold Approximation and Projection (UMAP). HPO terms were annotated for 4,474 individuals in the GMDB, and expert clinicians defined twelve distinct HPO-defined symptom groups. Based on the annotated HPO terms, each case was assigned to one or more HPO-defined symptom groups. All OMIM diseases were included, using their HPO annotations (gray background dots) as a reference. GMDB cases are color-coded according to their most pronounced HPO-defined symptom group, i.e., the group that includes the majority of their HPO terms. The dataset is dominated by two major clusters (facial dysmorphism in yellow and neurodevelopmental in blue) but shows cases from across the complete disease landscape. Right: Heatmap of the proportion of GMDB individuals within the HPO-defined symptom group on the X-axis who are also assigned to the HPO-defined symptom group on the Y-axis. Notably, facial dysmorphism is present in at least 70% of the cases of each HPO-defined symptom group. d) Proportion of the unpublished and published images in each ancestry group. e) Proportion of the unpublished and published images in each sub-ancestry group.
Figure 5:
Figure 5:. Performance of ancestry analysis.
a) Top-1 and top-5 accuracy of GestaltMatchers’ disorder classification accuracy per ancestral group. Top-1 and top-5 accuracy of the models’ disorder classification accuracy per ancestral group, where (blue) belongs to the EU only subset, and (yellow) belongs to the diverse subset. Each wide, darker bar and each light, thinner bar indicate the top-1 and top-5 accuracy per ancestral group, respectively. The horizontal dashed lines and dotted lines indicate the top-1 and top-5 overall accuracy averaged over all ancestral groups, respectively. The order of the ancestry group in the x-axis is ranked according to standard deviation between top-1 accuracies of the 5-fold experiment. b) Top-1 accuracy of GestaltMatcher when including different proportion of non-European patients in the gallery. The x-axis is the proportion of non-European data included in the gallery. The y-axis is the top-1 accuracy. The colored region along the line indicates the standard deviation.

Similar articles

References

    1. Hart T. C. & Hart P. S. Genetic studies of craniofacial anomalies: clinical implications and applications. Orthod. Craniofac. Res. 12, 212–220 (2009). - PMC - PubMed
    1. Lesmann H., Klinkhammer H. & M. Krawitz Dr. med. Dipl. Phys. Peter. The future role of facial image analysis in ACMG classification guidelines. Med. Genet. 35, 115–121 (2023). - PMC - PubMed
    1. Tekendo-Ngongang C. et al. Rubinstein-Taybi syndrome in diverse populations. Am. J. Med. Genet. A 182, 2939–2950 (2020). - PubMed
    1. Kruszka P., Tekendo-Ngongang C. & Muenke M. Diversity and dysmorphology. Curr. Opin. Pediatr. 31, 702–707 (2019). - PubMed
    1. Hadj-Rabia S. et al. Automatic recognition of the XLHED phenotype from facial images. Am. J. Med. Genet. A 173, 2408–2414 (2017). - PubMed

Publication types

LinkOut - more resources