. 2014 Jun 24:3:e02020.

doi: 10.7554/eLife.02020.

Diagnostically relevant facial gestalt information from ordinary photos

Quentin Ferry¹, Julia Steinberg², Caleb Webber³, David R FitzPatrick⁴, Chris P Ponting³, Andrew Zisserman⁵, Christoffer Nellåker⁶

Affiliations

¹ Department of Engineering Science, University of Oxford, Oxford, United Kingdom Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.
² Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
³ Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.
⁴ Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, Edinburgh, United Kingdom.
⁵ Department of Engineering Science, University of Oxford, Oxford, United Kingdom az@robots.ox.ac.uk.
⁶ Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom christoffer.nellaker@dpag.ox.ac.uk.

PMID: 24963138
PMCID: PMC4067075
DOI: 10.7554/eLife.02020

Diagnostically relevant facial gestalt information from ordinary photos

Quentin Ferry et al. Elife. 2014.

. 2014 Jun 24:3:e02020.

doi: 10.7554/eLife.02020.

Authors

Quentin Ferry¹, Julia Steinberg², Caleb Webber³, David R FitzPatrick⁴, Chris P Ponting³, Andrew Zisserman⁵, Christoffer Nellåker⁶

Affiliations

¹ Department of Engineering Science, University of Oxford, Oxford, United Kingdom Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.
² Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
³ Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.
⁴ Medical Research Council Human Genetics Unit, Institute of Genetics and Molecular Medicine, Edinburgh, United Kingdom.
⁵ Department of Engineering Science, University of Oxford, Oxford, United Kingdom az@robots.ox.ac.uk.
⁶ Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom christoffer.nellaker@dpag.ox.ac.uk.

PMID: 24963138
PMCID: PMC4067075
DOI: 10.7554/eLife.02020

Abstract

Craniofacial characteristics are highly informative for clinical geneticists when diagnosing genetic diseases. As a first step towards the high-throughput diagnosis of ultra-rare developmental diseases we introduce an automatic approach that implements recent developments in computer vision. This algorithm extracts phenotypic information from ordinary non-clinical photographs and, using machine learning, models human facial dysmorphisms in a multidimensional 'Clinical Face Phenotype Space'. The space locates patients in the context of known syndromes and thereby facilitates the generation of diagnostic hypotheses. Consequently, the approach will aid clinicians by greatly narrowing (by 27.6-fold) the search space of potential diagnoses for patients with suspected developmental disorders. Furthermore, this Clinical Face Phenotype Space allows the clustering of patients by phenotype even when no known syndrome diagnosis exists, thereby aiding disease identification. We demonstrate that this approach provides a novel method for inferring causative genetic variants from clinical sequencing data through functional genetic pathway comparisons.DOI: http://dx.doi.org/10.7554/eLife.02020.001.

Keywords: clinical genetics; computational biology; computer vision; phenotyping.

PubMed Disclaimer

Conflict of interest statement

CPP: Senior editor, eLife.

The other authors declare that no competing interests exist.

Figures

**Figure 1.. Overview of the computational approach and average faces of syndromes.**
(A) A photo is automatically analyzed to detect faces and feature points are placed using computer vision algorithms. Facial feature annotation points delineate the supra-orbital ridge (8 points), the eyes (mid points of the eyelids and eye canthi, 8 points), nose (nasion, tip, ala, subnasale and outer nares, 7 points), mouth (vermilion border lateral and vertical midpoints, 6 points) and the jaw (zygoma mandibular border, gonion, mental protrubance and chin midpoint, 7 points). Shape and Appearance feature vectors are then extracted based on feature points and these determine the photo's location in Clinical Face Phenotype Space (further details on feature points in Figure 1—figure supplement 1). This location is then analyzed in the context of existing points in Clinical Face Phenotype Space to extract phenotype similarities and diagnosis hypotheses (further details on Clinical Face Phenotype Space with simulation examples in Figure 1—figure supplement 2). (B) Average faces of syndromes in the database constructed using AAM models (‘Materials and methods’) and number of individuals which each average face represents. See online version of this manuscript for animated morphing images that show facial features differing between controls and syndromes (Figure 2). **DOI:**http://dx.doi.org/10.7554/eLife.02020.003

**Figure 1—figure supplement 2.. Phenotypic vs spurious feature variation in Clinical Face Phenotype Space using simulated faces.**
Simulated 3D faces were used to visualize the influence of spurious variation in raw feature space and Clinical Face Phenotype Space. (A) 100 faces with controlled phenotype, lighting, and rotation variation were rendered. (B) Visualization of a population of simulated faces in the first two Multi-Dimensional Scaling (MDS) modes. Face clustering in raw feature space and Clinical Face Phenotype Space colored by lighting, rotation, and face phenotype, respectively. In the raw feature space lighting is the dominating clustering factor, in Clinical Face Phenotype Space phenotype underlies the primary clustering. (C) The first 16 modes of PCA decomposition of the raw feature vectors and in the Clinical Face Phenotype Space colored by lighting and rotation of the simulated faces. In the raw feature space, lighting, and rotation variation are encoded in the 2nd and 1st modes, indicating that clustering is dominated by spurious variation. In the Clinical Face Phenotype Space, lighting is represented in the 9th mode, whereas rotation is no longer represented in the first 16 modes. This shows that the Clinical Face Phenotype Space transformation reduces the influence of spurious variation on clustering of phenotypes. **DOI:**http://dx.doi.org/10.7554/eLife.02020.005

**Figure 2—figure supplement 1.. Distortion graphs representing the characteristic deformation of syndrome faces relative to the average control face.**
Each line reflects whether the distance is extended or contracted compared with the control face. White—the distance is similar to controls, blue—shorter relative to controls, and red—extended in patients relative to controls. **DOI:**http://dx.doi.org/10.7554/eLife.02020.009

**Figure 3.. Clinical Face Phenotype Space enhances the separation of different dysmorphic syndromes.**
The graph shows a two dimensional representation of the full Clinical Face Phenotype Space, with links to the 10 nearest neighbors of each photo (circle) and photos placed with force-directed graphing. The Clustering Improvement Factor (CIF, fold better clustering than random expectation) estimate for each of the syndromes is shown along the periphery. **DOI:**http://dx.doi.org/10.7554/eLife.02020.010

**Figure 4.. Clinical Face Phenotype Space is generalizable to dysmorphic syndromes that are absent from a training set.**
(A) Clustering Improvement Factor (CIF) estimates are plotted vs the number of individuals per syndrome grouping in the Gorlin collection or patients with similar genetic variant diagnoses. As expected, the stochastic variance in CIF is inversely proportional to the number of individuals available for sampling. The median CIF across all groups is 27.6-fold over what is expected by clustering syndromes randomly. That is to say, the CIF of a randomly placed set is 1. The maximum CIF is fixed by the total number of images in the database and by the cardinality of a syndrome set: the theoretical maximal CIF upper bound is plotted as a red dotted line. The CIF for the minimum and maximum, Cutislaxa syndrome and Otodental syndrome, were 1.0 and 700.0 respectively. (B) Average probabilistic classification accuracies of each individual face placed in Clinical Face Phenotype Space (class prioritization by 20 nearest neighbors weighted by prevalence in the database). The 8 initial syndromes used to train Clinical Face Phenotype Space are shown in color. For syndromes with fewer than 50 examples, accuracies were averaged across all syndromes binned by data set size (i.e., the average accuracy is shown for syndromes with 2–5, 6–10, 11–25, and 26–50 images in the database, Supplementary file 1). Classification accuracies increase proportional to the number of individuals with the syndrome present in the database. Accuracies using support vector machines with binary and forced choice classifications are shown in Figure 4—figure supplement 1 and Figure 4—figure supplement 2. A simulation example of probabilistic querying of Clinical Face Phenotype Space is shown in Figure 4—figure supplement 3. **DOI:**http://dx.doi.org/10.7554/eLife.02020.011

**Figure 4—figure supplement 1.. SVM binary classification accuracies among the 8 syndromes in Table 1.**
SVM classifier accuracies when tuned for equal false positive and false negative error rates. **DOI:**http://dx.doi.org/10.7554/eLife.02020.012

**Figure 4—figure supplement 2.. SVM forced choice classification accuracies among the 8 syndromes in Table 1.**
**DOI:**http://dx.doi.org/10.7554/eLife.02020.013

**Figure 4—figure supplement 3.. Simulated example illustrating the Clustering Improvement Factor.**
A random scattering of 100 points in 2 dimensions is used as a background set (black circles with white fill). The 20 red plus symbols (within the red shaded area) are a random set of points lying within the same limits as the background set and have a CIF of 0.9. This is the actual degree of clustering of the red points with respect to the expectation of clustering them with 95% confidence (E(r) = 5.6). The filled green circles (within the green shaded area) are the red points shifted by +0.5 units in each dimension and have a CIF of 2.7. The black points (within the gray shaded area) are the red plus symbol positions scaled by 0.5 and then shifted by +1.5 units in dimension 1. The black points are non-overlapping with the background and represent the maximal CIF (of 5.6) in this example. **DOI:**http://dx.doi.org/10.7554/eLife.02020.014

**Figure 4—figure supplement 4.. Simulated example of probabilistic querying of Clinical Face Phenotype Space.**
(A) Visualization of a population of simulated faces in the first two Multi-Dimensional Scaling (MDS) modes. 7 classes of points (simulated 'syndrome groups') are shown with different distributions and variances. A central 'query' face is indicated by the boxed cross. The 20 nearest neighbors of the query are encircled with a black border. (B) Inset bar graph shows diagnosis hypothesis ranked by class priority. The class priority ranking weights the dispersion and prevalence (spread and number) of a class in the Clinical Face Phenotype Space with the nearest neighbors to assign the most probable diagnosis hypotheses. In the example, the ranked diagnosis estimates of the query point would be class 7, then class 6, and thirdly class 4. The scatter plot shows the individual similarity p0p1 estimates, reflecting their relative closeness in the space as compared to local neighborhood, for the 20 nearest neighbors of the query. The first nearest neighbor is estimated to be 2.6-fold closer to the query than the average based on the local density of neighbors. The dotted line indicates the average relative distance between points among the 20 nearest neighbors. (C) Inset bar graph shows the number of neighbors of the query per class. A scatterplot of dispersion vs cardinality, i.e. relative spread of points and what proportion of the total number of points belong to that class in the simulated space. Plots (B) and (C) allow objective assessment of the distribution of points shown in (A), and aid the interpretation of classification confidence. **DOI:**http://dx.doi.org/10.7554/eLife.02020.015

**Figure 5.. Clinical Face Phenotype Space recapitulates features of functional gene links between syndromes.**
Protein–protein interaction distances of 1–3 for genetically characterized syndromes are associated with significantly shorter Euclidean distance (arbitrary units) between syndromes in Clinical Face Phenotype Space as compared to syndromes with distance 4 or no known interaction distance (shown in orange) (Kruskal–Wallis tests with Bonferroni corrected p-values indicated as *p<0.05, **p<0.01, ***p<0.001). The Spearman correlation across all distances was r = 0.09, p<0.001. The numbers of pairwise syndrome comparisons underlying each of the interaction distances are listed within the respective boxes. **DOI:**http://dx.doi.org/10.7554/eLife.02020.016

**Figure 6.. Class priority of diagnostic classifications for images.**
The full computer vision algorithm and Clinical Face Phenotype Space analysis procedure with diagnostic hypothesis generation exemplified by: (A) a patient (Ferrero et al., 2007) with Williams-Beuren. (B) Abraham Lincoln. The former US President is thought to have had a marfanoid disorder, if not Marfan syndrome (Gordon, 1962; Sotos, 2012). Bar graphs show class prioritization of diagnostic hypotheses determined by 20 nearest neighbors weighted by prevalence in the database. As expected, the classification of Marfan is not successfully assigned in the first instance as there were only 18 faces of individuals with Marfan in the database (making this an example of a difficult case with the current database). However, the seventh suggestion is Marfan, despite this being among 90 different syndromes and 2754 faces. **DOI:**http://dx.doi.org/10.7554/eLife.02020.017

See this image and copyright information in PMC

References

1. Abecasis GR, Auton A, Brooks LD, Depristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, Mcvean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. - DOI - PMC - PubMed
1. Aldridge K, George ID, Cole KK, Austin JR, Takahashi TN, Duan Y, Miles JH. Facial phenotypes in subgroups of prepubertal boys with autism spectrum disorders are correlated with clinical phenotypes. Molecular Autism. 2011;2:15. doi: 10.1186/2040-2392-2-15. - DOI - PMC - PubMed
1. Allanson JE, Bohring A, Dorr HG, Dufke A, Gillessen-Kaesbach G, Horn D, Konig R, Kratz CP, Kutsche K, Pauli S, Raskin S, Rauch A, Turner A, Wieczorek D, Zenker M. The face of Noonan syndrome: does phenotype predict genotype. American Journal of Medical Genetics. 2010;152A:1960–1966. doi: 10.1002/ajmg.a.33518. - DOI - PMC - PubMed
1. Baird PA, Anderson TW, Newcombe HB, Lowry RB. Genetic disorders in children and young adults: a population study. American Journal of Human Genetics. 1988;42:677–693. - PMC - PubMed
1. Bastian M, Heymann S, Jacomy M. Gephi: An open source software for exploring and manipulating networks. AAAI Publications, Third International AAAI Conference on Weblogs and Social Media 2009

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- ClinicalTrials.gov
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Diagnostically relevant facial gestalt information from ordinary photos

Affiliations

Diagnostically relevant facial gestalt information from ordinary photos

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials