MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites

Oscar Esteban¹, Daniel Birman¹, Marie Schaer², Oluwasanmi O Koyejo³, Russell A Poldrack¹, Krzysztof J Gorgolewski¹

Affiliations

¹ Department of Psychology, Stanford University, Stanford, California, United States of America.
² Department of Psychiatry, University of Geneva School of Medicine, Geneva, Switzerland.
³ Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, United States of America.

PMID: 28945803
PMCID: PMC5612458
DOI: 10.1371/journal.pone.0184661

MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites

Oscar Esteban et al. PLoS One. 2017.

. 2017 Sep 25;12(9):e0184661.

doi: 10.1371/journal.pone.0184661. eCollection 2017.

Authors

Oscar Esteban¹, Daniel Birman¹, Marie Schaer², Oluwasanmi O Koyejo³, Russell A Poldrack¹, Krzysztof J Gorgolewski¹

Affiliations

¹ Department of Psychology, Stanford University, Stanford, California, United States of America.
² Department of Psychiatry, University of Geneva School of Medicine, Geneva, Switzerland.
³ Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, United States of America.

PMID: 28945803
PMCID: PMC5612458
DOI: 10.1371/journal.pone.0184661

Abstract

Quality control of MRI is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. Visual inspection is subjective and impractical for large scale datasets. Although automated quality assessments have been demonstrated on single-site datasets, it is unclear that solutions can generalize to unseen data acquired at new sites. Here, we introduce the MRI Quality Control tool (MRIQC), a tool for extracting quality measures and fitting a binary (accept/exclude) classifier. Our tool can be run both locally and as a free online service via the OpenNeuro.org portal. The classifier is trained on a publicly available, multi-site dataset (17 sites, N = 1102). We perform model selection evaluating different normalization and feature exclusion approaches aimed at maximizing across-site generalization and estimate an accuracy of 76%±13% on new sites, using leave-one-site-out cross-validation. We confirm that result on a held-out dataset (2 sites, N = 265) also obtaining a 76% accuracy. Even though the performance of the trained classifier is statistically above chance, we show that it is susceptible to site effects and unable to account for artifacts specific to new sites. MRIQC performs with high accuracy in intra-site prediction, but performance on unseen sites leaves space for improvement which might require more labeled data and new approaches to the between-site variability. Overcoming these limitations is crucial for a more objective quality assessment of neuroimaging data, and to enable the analysis of extremely large and multi-site samples.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Visual assessment of MR scans.**
Two images with prominent artifacts from the Autism Brain Imaging Data Exchange (ABIDE) dataset are presented on the left. An example scan (top) is shown with severe motion artifacts. The arrows point to signal spillover through the phase-encoding axis (right-to-left –RL–) due to eye movements (green) and vessel pulsations (red). A second example scan (bottom) shows severe coil artifacts. On the right, the panel displays one representative image frame extracted from the animations corresponding to the subjects presented on the left, as they are inspected by the raters during the animation. This figure caption is extended in Block 1 of S1 File.

**Fig 2. Inter-rater variability.**
The heatmap shows the overlap of the quality labels assigned by two different domain experts on 100 data points of the ABIDE dataset, using the protocol described in section Labeling protocol. We also compute the Cohen’s Kappa index of both ratings, and obtain a value of κ = 0.39. Using the table for interpretation of κ by Viera et al. [16], the agreement of both raters is “fair” to “moderate”. When the labels are binarized by mapping “doubtful” and “accept” to a single “good” label, the agreement increases to κ = 0.51 (“moderate”). The “fair” to “moderate” agreement of observers demonstrates a substantial inter-rater variability. The inter- and intra- rater variabilities translate into the problem as *class-noise* since a fair amount of data points are assigned noisy labels that are not consistent with the labels assigned on the rest of the dataset. An extended investigation of the inter- and intra- rater variabilities is presented in Block 5 of S1 File.

**Fig 3. Inter-site variability renders as a *batch effect* on the calculated IQMs.**
These plots display features extracted by MRIQC (columns) of all participants (rows), clustered by site (17 centers from the ABIDE datasets, plus the two centers where DS030 was acquired –“BMC” and “CCN”–). The plot of original features (left panel) shows how they can easily be clustered by the site they belong to. After site-wise normalization including centering and scaling within site (right), the measures are more homogeneous across sites. Features are represented in arbitrary units. For better interpretation, the features-axis (x) has been mirrored between plots.

**Fig 4. MRIQC’s processing data flow.**
Images undergo a minimal processing pipeline to obtain the necessary corrected images and masks required for the computation of the IQMs.

**Fig 5. Visual reports.**
MRIQC generates one individual report per subject in the input folder and one group report including all subjects. To visually assess MRI samples, the first step (1) is opening the group report. This report shows boxplots and strip-plots for each of the IQMs. Looking at the distribution, it is possible to find images that potentially show low-quality as they are generally reflected as outliers in one or more strip-plots. For instance, in (2) hovering a suspicious sample within the coefficient of joint variation (CJV) plot, the subject identifier is presented (“sub-51296”). Clicking on that sample will open the individual report for that specific subject (3). This particular example of individual report is available online at https://web.stanford.edu/group/poldracklab/mriqc/reports/sub-51296_T1w.html.

**Fig 6. Nested cross-validation for model selection.**
The plots on the left represent the scores (AUC on top, ACC below) obtained in the outer loop of nested cross-validation, using the LoSo split. The plots show how certain sites are harder to predict than others. On the right, the corresponding violin plots that summarize the overall performance. In both plots, the dashed lines represent the averaged cross-validated performance for the three models: RFC (blue line, AUC = 0.73±0.15, ACC = 76.15%±13.38%), SVC-lin (light orange, AUC = 0.68±0.18, ACC = 67.54%±20.82%), and SVC-rbf (dark orange, AUC = 0.64±0.17, ACC = 69.05%±18.90%).

**Fig 7. Evaluation on the held-out dataset.**
A. A total of 50 features are selected by the preprocessing steps. The features are ordered from highest median importance (the QI₂ [12]) to lowest (percentile 5% of the intensities within the GM mask). The boxplots represent the distribution of importances of a given feature within all trees in the ensemble. B. (Left) Four different examples of false negatives of the *DS030* dataset. The red boxes indicate a ghosting artifact, present in more than 20% of the images. Only extreme cases where the ghost overlaps the cortical GM layer of the occipital lobes are presented. (Right) Two examples of false positives. The two examples are borderline cases that were rated as “doubtful”. Due to the intra- and inter- rater variabilities, some data points with poorer overall quality are rated just “doubtful”. These images demonstrate the effects of the noise in the quality labels.

See this image and copyright information in PMC

References

1. Power JD, Barnes KA, Snyder AZ, Schlaggar BL, Petersen SE. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage. 2012;59(3):2142–2154. 10.1016/j.neuroimage.2011.10.018 - DOI - PMC - PubMed
1. Yendiki A, Koldewyn K, Kakunoori S, Kanwisher N, Fischl B. Spurious group differences due to head motion in a diffusion MRI study. NeuroImage. 2014;88:79–90. 10.1016/j.neuroimage.2013.11.027 - DOI - PMC - PubMed
1. Reuter M, Tisdall MD, Qureshi A, Buckner RL, van der Kouwe AJW, Fischl B. Head motion during MRI acquisition reduces gray matter volume and thickness estimates. NeuroImage. 2015;107:107–115. 10.1016/j.neuroimage.2014.12.006 - DOI - PMC - PubMed
1. Alexander-Bloch A, Clasen L, Stockman M, Ronan L, Lalonde F, Giedd J, et al. Subtle in-scanner motion biases automated measurement of brain anatomy from in vivo MRI. Human Brain Mapping. 2016;37(7):2385–2397. 10.1002/hbm.23180 - DOI - PMC - PubMed
1. Kaufman L, Kramer DM, Crooks LE, Ortendahl DA. Measuring signal-to-noise ratios in MR imaging. Radiology. 1989;173(1):265–267. 10.1148/radiology.173.1.2781018 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites

Affiliations

MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical