Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 14;13(1):11396.
doi: 10.1038/s41598-023-28632-x.

Diverse types of expertise in facial recognition

Affiliations

Diverse types of expertise in facial recognition

Alice Towler et al. Sci Rep. .

Abstract

Facial recognition errors can jeopardize national security, criminal justice, public safety and civil rights. Here, we compare the most accurate humans and facial recognition technology in a detailed lab-based evaluation and international proficiency test for forensic scientists involving 27 forensic departments from 14 countries. We find striking cognitive and perceptual diversity between naturally skilled super-recognizers, trained forensic examiners and deep neural networks, despite them achieving equivalent accuracy. Clear differences emerged in super-recognizers' and forensic examiners' perceptual processing, errors, and response patterns: super-recognizers were fast, biased to respond 'same person' and misidentified people with extreme confidence, whereas forensic examiners were slow, unbiased and strategically avoided misidentification errors. Further, these human experts and deep neural networks disagreed on the similarity of faces, pointing to differences in their representations of faces. Our findings therefore reveal multiple types of facial recognition expertise, with each type lending itself to particular facial recognition roles in operational settings. Finally, we show that harnessing the diversity between individual experts provides a robust method of maximizing facial recognition accuracy. This can be achieved either via collaboration between experts in forensic laboratories, or most promisingly, by statistical fusion of match scores provided by different types of expert.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Different processing underlies face recognition expertise in super-recognizers and forensic examiners. (A): An example trial from the Expertise in Facial Comparison Test (EFCT). These images show different people. (B): Super-recognizers demonstrate superior accuracy after seeing face images for just 2 s, suggesting that fast, intuitive processes underlie their expertise whereas examiners’ expertise only becomes apparent when given sufficient time to deploy their slow, feature-by-feature comparison strategy. Violin plots show the distribution of performance for student controls, forensic examiners and super-recognizers on the upright conditions of the EFCT. Red lines show group means.
Figure 2
Figure 2
Comparison of the best available face recognition solutions. (A) Example 1-to-1 comparison from the 2018 ENFSI proficiency test. These images show the same person. (B) Ranked accuracy of the best available face recognition solutions. Red lines show group means. Forensic examiners, super-recognizers and DNNs achieved equivalent levels of accuracy and were all superior to novice participants. Group-based laboratory decisions (right) were more accurate than decisions reached by individuals, pointing to benefits of collective decision-making.
Figure 3
Figure 3
Forensic examiners avoid costly errors made by super-recognizers. (A): A large proportion of super-recognizers’ errors were high confidence ‘same person’ errors (responses of 4 and 5). Forensic examiners never made these errors. In forensic settings, false positive errors of this sort may lead to wrongful convictions, especially when made with high confidence. (B): Unlike forensic examiners, super-recognizers tended towards high confidence responses. This tendency was most apparent for ‘same person’ decisions, reflecting a response bias for super-recognizers to respond ‘same person’. Also, nearly 10% of forensic examiners’ responses and almost none of super-recognizers’ responses (0.27%) were ‘inconclusive’. Error bars show standard error of the mean. See supplementary materials for an extended version of this figure including novices.
Figure 4
Figure 4
Correlation heatmaps of the similarity of responses between 177 human participants and 10 facial recognition DNNs. Red pixels indicate a positive Spearman’s rank-order correlation, blue pixels indicate a negative correlation, and yellow pixels indicate zero correlation. While humans and DNNs tended to agree on the similarity of same-person face pairs (A), they showed striking disagreement on the similarity of different-people face pairs (B), indicated by the increase in the number of blue pixels visible on the top and right-hand edge of the heatmap. The heatmaps were generated using the ggcorrplot package in R.
Figure 5
Figure 5
Optimal accuracy in face recognition is achieved by aggregating responses of diverse experts. Violin plots show the distribution of accuracy scores (AUCs) for each fusion. Red lines show median accuracy. Comparison of individuals, pairs and triplets shows increased accuracy with increasing group size. The best results occur when fusing responses from both humans and DNNs (yellow), resulting in more accurate decisions compared to either human–human (purple) or DNN-DNN fusions. Fusing human experts’ decisions models decisions made by forensic laboratories, and the benefits of doing so may explain the superiority of forensic laboratory decisions. Here, we report the best performing DNN (DNN10) for the individuals analysis, and the DNNs that produce the strongest fusion effects with DNN10 for the pairs (DNN3) and triplets (DNN3 and DNN1) analyses. However, we note the results are consistent for almost all DNNS. SR = super-recognizer; EX = forensic examiner.
Figure 6
Figure 6
Super-recognizers show consistent and face-specific identification expertise. Violin plots show the distribution of performance for super-recognizers and controls on the test battery. Red lines show group means. Super-recognizers outperformed controls on a battery of standardised tests measuring face matching (GFMT, Models), face recognition memory (CFMT + , CFMT-Aus ), and general face identification abilities (UNSW Face Test). To a lesser extent, they were also better than controls on both the Primate Matching Test and Fingerprint Matching Test but not the MFFT test, suggesting some overlapping ability across domains in perceptual matching. We are unable to show the primate and fingerprint stimuli used in the test. Example primate faces were obtained from Pixabay (https://pixabay.com/images/search/monkey%20face/) and are released under the Pixabay License. Fingerprint images are by Metrónomo and licenced under SelfCC BY-SA 2.5 AR (https://creativecommons.org/licenses/by-sa/2.5/ar/deed.en).
Figure 7
Figure 7
Super-recognizers’ and forensic examiners’ accuracy on professional face matching tasks. Violin plots show the distribution of performance for student controls, forensic examiners and super-recognizers. Red lines show group means. Super-recognizers outperformed controls on both tests but were statistically equivalent to forensic examiners. The top row shows example stimuli for the PICT and Facial Recognition Candidate List Test. Facial Recognition Candidate List Test images are representative examples as the test stimuli are real passport images which we cannot show for privacy reasons.

References

    1. Bruce V, Henderson Z, Newman C, Burton AM. Matching identities of familiar and unfamiliar faces caught on CCTV images. J. Exp. Psychol. Appl. 2001;7:207–218. doi: 10.1037/1076-898X.7.3.207. - DOI - PubMed
    1. White D, Kemp RI, Jenkins R, Matheson M, Burton AM. Passport officers' errors in face matching. PLoS ONE. 2014;9:1–6. doi: 10.1371/journal.pone.0103510. - DOI - PMC - PubMed
    1. Wirth BE, Carbon CC. An easy game for frauds? Effects of professional experience and time pressure on passport-matching performance. J. Exp. Psychol. Appl. 2017;23:138–157. doi: 10.1037/xap0000114. - DOI - PubMed
    1. White, D., Towler, A. & Kemp, R. I. 2021 Forensic face matching: Research and practice M. Bindemann (ed.), Oxford University Press, Oxford.
    1. Moreton, R. In Forensic face matching: Research and practice (ed M. Bindemann) (Oxford University Press, 2021).

Publication types