Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 24:12:617997.
doi: 10.3389/fpsyt.2021.617997. eCollection 2021.

Multisite Comparison of MRI Defacing Software Across Multiple Cohorts

Affiliations

Multisite Comparison of MRI Defacing Software Across Multiple Cohorts

Athena E Theyers et al. Front Psychiatry. .

Abstract

With improvements to both scan quality and facial recognition software, there is an increased risk of participants being identified by a 3D render of their structural neuroimaging scans, even when all other personal information has been removed. To prevent this, facial features should be removed before data are shared or openly released, but while there are several publicly available software algorithms to do this, there has been no comprehensive review of their accuracy within the general population. To address this, we tested multiple algorithms on 300 scans from three neuroscience research projects, funded in part by the Ontario Brain Institute, to cover a wide range of ages (3-85 years) and multiple patient cohorts. While skull stripping is more thorough at removing identifiable features, we focused mainly on defacing software, as skull stripping also removes potentially useful information, which may be required for future analyses. We tested six publicly available algorithms (afni_refacer, deepdefacer, mri_deface, mridefacer, pydeface, quickshear), with one skull stripper (FreeSurfer) included for comparison. Accuracy was measured through a pass/fail system with two criteria; one, that all facial features had been removed and two, that no brain tissue was removed in the process. A subset of defaced scans were also run through several preprocessing pipelines to ensure that none of the algorithms would alter the resulting outputs. We found that the success rates varied strongly between defacers, with afni_refacer (89%) and pydeface (83%) having the highest rates, overall. In both cases, the primary source of failure came from a single dataset that the defacer appeared to struggle with - the youngest cohort (3-20 years) for afni_refacer and the oldest (44-85 years) for pydeface, demonstrating that defacer performance not only depends on the data provided, but that this effect varies between algorithms. While there were some very minor differences between the preprocessing results for defaced and original scans, none of these were significant and were within the range of variation between using different NIfTI converters, or using raw DICOM files.

Keywords: 3D rendering; de-identification; defacing; facial recognition; privacy—preserving; structural MRI.

PubMed Disclaimer

Conflict of interest statement

The authors declare that this study received funding from Lundbeck, Bristol-Myers Squibb, Pfizer, and Servier. The funders were not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication. RM has received consulting and speaking honoraria from AbbVie, Allergan, Janssen, KYE, Lundbeck, Otsuka, and Sunovion, and research grants from CAN-BIND, CIHR, Janssen, Lallemand, Lundbeck, Nubiyota, OBI, and OMHF. RL has received honoraria or research funds from Allergan, Asia-Pacific Economic Cooperation, BC Leading Edge Foundation, CIHR, CANMAT, Canadian Psychiatric Association, Hansoh, Healthy Minds Canada, Janssen, Lundbeck, Lundbeck Institute, MITACS, Myriad Neuroscience, Ontario Brain Institute, Otsuka, Pfizer, St. Jude Medical, University Health Network Foundation, and VGH-UBCH Foundation. SCS is the Chief Scientific Officer of ADMdx, Inc., which receives NIH funding, and he currently has research grants from Brain Canada, Canada Foundation for Innovation (CFI), Canadian Institutes of Health Research (CIHR), and the Ontario Brain Institute in Canada. BF has received a research grant from Pfizer. SK has received research funding or honoraria from Abbott, Alkermes, Allergan, Bristol-Myers Squibb, Brain Canada, Canadian Institutes for Health Research (CIHR), Janssen, Lundbeck, Lundbeck Institute, Ontario Brain Institute (OBI), Ontario Research Fund (ORF), Otsuka, Pfizer, Servier, Sunovion, and Xian-Janssen. EA has served as a consultant to Roche, has received grant funding from Sanofi Canada and SynapDx, has received royalties from APPI and Springer, and received kind support from AMO Pharmaceuticals, honoraria from Wiley, and honorarium from Simons Foundations. GM has received consultancy/speaker fees from Lundbeck, Pfizer, Johnson & Johnson and Janssen. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
True facial recognition task images. Top row: Sample original (pre-defaced) 3D rendered T1 image. Three perspectives of the head were generated, including 45° left, straight on, and 45° right. Bottom: The same image after undergoing defacing (in this case, pydeface) and de-earring. Consent was given by the participant to include their non-defaced MRI render in the publication.
Figure 2
Figure 2
Percentage of scans passed by each rater, split by dataset and defacing algorithm. Markers indicate the average percentage for each algorithm. Pooled ratings indicate the percentage of scans that passed based on rater consensus for each scan. *Disclaimer: afni_refacer_run ratings had to be redone due to a major software update after initial data collection. Due to the unavailability of the original Rater 2, these ratings were completed by a different person.
Figure 3
Figure 3
Percentage of scans where errors were detected for each of the seven algorithms, split based on error class. “Face” refers to scans that were failed due to at least one identifiable facial feature (eyes, nose, mouth) remaining after defacing, “brain” refers to scans that were failed due to the algorithm removing neuronal tissue, while “both” references scans where both of these errors occurred. Pooled ratings were calculated from the rater consensus for each scan. *Disclaimer: afni_refacer_run ratings had to be redone due to a major software update after initial data collection. Due to the unavailability of the original Rater 2, these ratings were completed by a different person.
Figure 4
Figure 4
Total percentage of scans where OpenCV or human raters detected a face within the 3D render, segmented by whether a face was detected by both (blue), OpenCV only (gray), or through manual ratings only—subdivided into partial faces (1 feature—orange) and full faces (2+ features—red). FreeSurfer was excluded as none of the scans were determined to have any faces by either manual ratings or OpenCV.
Figure 5
Figure 5
Average probability of a face within the 3D render as calculated by OpenCV, split based on manual rating consensus. Partial indicates scans where only one facial feature remained in the render, while Face indicates any scans where two or more features remained. FreeSurfer was excluded as all scans were rated as having been fully defaced. The gray line indicates the default threshold used by OpenCV to decide whether or not a face is present within the render.
Figure 6
Figure 6
Bar plot indicating the percent distribution of facial recognition quiz results across the nine raters, for the renders of the defaced and original scans. Renders that were correctly matched to their corresponding photograph are considered “correct,” renders that were matched to the wrong photograph are considered “incorrect,” and those that the raters felt were not clear enough to attempt an identification are labeled “can't identify”.
Figure 7
Figure 7
Boxplot showing the percent difference between each of the defaced images and the original pre-defaced scan (left) for several global measures generated from the FreeSurfer pipeline, including total brain volume, estimated total intracranial volume (eTIV), cortical and subcortical gray matter volumes, white matter volume, average left and right hemisphere cortical thickness and the contrast-to-noise ratio (CNR) between cortical gray and white matter. These are compared to the variations in the same measures (right) with FreeSurfer output initialized with the original scan in different file formats (raw DICOM and NIfTI files converted using a different method).
Figure 8
Figure 8
Boxplot showing the percent overlap of FreeSurfer segmented tissues for the defaced scans (left) and different file formats (raw DICOM and NIfTI files converted using alternate method, right), as defined by the area of the intersection with the original input scan, divided by the area of the union with the original, for that tissue.
Figure 9
Figure 9
Boxplot of the difference between FLIRT parameters for the original scans and the defaced scans (left) and different file formats (right) when aligned to the MNI 152 brain template. Parameters have been split by translation in mm (A), rotation converted to degrees (B), scale (C), and skew (D).

References

    1. Schwarz CG, Kremers WK, Therneau TM, Sharp RR, Gunter JL, Vemuri P, et al. . Identification of anonymous MRI research participants with face-recognition software. N Engl J Med. (2019) 381:1684–6. 10.1056/NEJMc1908881 - DOI - PMC - PubMed
    1. Mazura JC, Juluru K, Chen JJ, Morgan TA, John M, Siegel EL. Facial recognition software success rates for the identification of 3D surface reconstructed facial images: implications for patient privacy and security. J Digit Imaging. (2012) 25:347–51. 10.1007/s10278-011-9429-3 - DOI - PMC - PubMed
    1. Nettrour JF, Burch MB, Bal BS. Patients, pictures, and privacy: managing clinical photographs in the smartphone era. Arthroplast Today. (2019) 5:57–60. 10.1016/j.artd.2018.10.001 - DOI - PMC - PubMed
    1. Smith SM. Robust automated brain extraction. NeuroImage. (2000) 11:S625. 10.1016/s1053-8119(00)91555-6 - DOI
    1. Iglesias JE, Liu C-Y, Thompson PM, Tu Z. Robust brain extraction across datasets and comparison with publicly available methods. IEEE Trans Med Imaging. (2011) 30:1617–34. 10.1109/TMI.2011.2138152 - DOI - PubMed

LinkOut - more resources