Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 14;21(1):18.
doi: 10.1186/s12968-019-0523-x.

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Affiliations

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Robert Robinson et al. J Cardiovasc Magn Reson. .

Abstract

Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools such as image segmentation methods are employed to derive quantitative measures or biomarkers for further analyses. Manual inspection and visual QC of each segmentation result is not feasible at large scale. However, it is important to be able to automatically detect when a segmentation method fails in order to avoid inclusion of wrong measurements into subsequent analyses which could otherwise lead to incorrect conclusions.

Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4800 cardiovascular magnetic resonance (CMR) scans. We then apply our method to a large cohort of 7250 CMR on which we have performed manual QC.

Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using the predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4800 scans for which manual segmentations were available. We mimic real-world application of the method on 7250 CMR where we show good agreement between predicted quality metrics and manual visual QC scores.

Conclusions: We show that Reverse classification accuracy has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.

Keywords: Automatic quality control; Population imaging; Segmentation.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The UKBB has approval from the North West Research Ethics Committee (REC reference: 11/NW/0382). All participants have given written informed consent.

Consent for publication

Not applicable.

Competing interests

Steffen E. Petersen provides consultancy to Circle Cardiovascular Imaging Inc. (Calgary, Alberta, Canada). Ben Glocker receives research funding from HeartFlow Inc. (Redwood City, CA, USA).

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Reverse Classification Accuracy - Single-atlas Registration Classifier. Reverse Classification Accuracy (RCA), with single-atlas registration classifier, as applied in our study. A set of reference images are first registered to the test-image before the resulting transformations are used to warp the corresponding reference segmentations. Dice Similarity Coefficient (DSC) is calculated between the warped segmentations and the test-segmentation with the maximum DSC taken as a proxy for the accuracy of the test-segmentation. Note that in practice, the ground truth test-segmentation is absent. Images and segmentation annotated as referred to in the text
Fig. 2
Fig. 2
Example Results from RCA. Examples of RCA results on one proposed segmentation. The panels in the top row show (left to right) the MRI scan, the predicted segmentation, an overlay and the manual annotation. The array below shows a subset of the 100 reference images ordered by Dice similarity coefficient (DSC) and equally spaced from highest to lowest DSC. The array shows (left) the reference image, (middle) its ground truth segmentation and (right) the test-segmentation from the upper row which has been warped to the reference image. The real DSC between each reference image and warped segmentation is shown for each pair. RCA-predicted and real GT-calculated DSCs are shown for the whole-heart binary classification case at the top alongside the metrics for each individual class in the segmentation
Fig. 3
Fig. 3
RCA Validation on 400 cardiac MRI. 400 cardiac MRI segmentations were generated with a Random Forest classifier. 500 trees and depths in the range [5, 40] were used to simulate various degrees of segmentation quality. RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). Ground truth for the scans is known so real metrics are also calculated. All calculations on the whole-heart binary classification task. We report low mean absolute error (MAE) for all metrics and 99% binary classification accuracy (TPR = 0.98, FPR = 0.00) with a DSC threshold of 0.70. High accuracy for individual segmentation classes. Absolute error for each image is shown for each metric. We note increasing error with decreasing quality of segmentation based on the real metric score
Fig. 4
Fig. 4
Validation on 4805 Random Forest segmentations of UKBB Imaging Study with Ground Truth. 4,805 cardiac MRI were segmented with a Random Forest classifier. 500 trees and depths in the range [5 40] were used to simulate various degrees of segmentation quality. Manual contours were available through Biobank Application 2964. RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). All calculations on the whole-heart binary classification task. We report low mean absolute error (MAE) for all metrics and 95% binary classification accuracy (TPR = 0.97 and FPR = 0.15) with a DSC threshold of 0.70. High accuracy for individual segmentation classes
Fig. 5
Fig. 5
Extensive Reverse Classification Accuracy Validation on 900 UKBB Segmentations. Convolutional neural network (CNN) segmentation as in Bai et al. [4]. Manual contours were available through Biobank Application 2964. RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). All calculations for the binary quality classification task on (top) ’Whole Heart’ average and (bottom) Left Ventricular Myocardium. We report low mean absolute error (MAE) for all metrics and 99.8% binary classification accuracy (TPR = 1.00 and FPR = 0.00) with a DSC threshold of 0.70
Fig. 6
Fig. 6
RCA Application on 7250 Cardiac MRI segmentations of UKBB Imaging Study. 7,250 cardiac MRI segmentations generated with a multi-atlas segmentation approach [18]. Manual QC scores given in the range [0 6] (i.e. [0 2] for each of basal, mid and apical slices). RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). All calculations on the LV Myocardium binary classification task. We show correlation in all metrics. Examples show: a) and b) agreement between low predicted DSC and low manual QC score, c) successful automated identification of poor segmentation with low predicted DSC despite high manual QC score and d) agreement between high predicted DSC and high manual QC score. Inserts in top row display extended range of y-axis
Fig. 7
Fig. 7
Investigating the Effect of Reference Set Size on Prediction Accuracy. 4,805 automated segmentations from Experiment B were processed with Reverse Classification Accuracy (RCA) using differing numbers of reference images. Random subsets of 10, 15, 35, 50, 65 and 75 reference images were taken from the full set of 100 available reference images. Five random runs were performed to obtain error bars for each setting. Average prediction accuracy increases with increasing number of reference images and the variance between runs also decreases

References

    1. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015; 12(3):1–10. 10.1371/journal.pmed.1001779. - PMC - PubMed
    1. Shariff A, Kangas J, Coelho LP, Quinn S, Murphy RF. Automated Image Analysis for High-Content Screening and Analysis. J Biomol Screen. 2010; 15(7):726–34. 10.1177/1087057110370894. - PubMed
    1. de Bruijne M. Machine learning approaches in medical image analysis: From detection to diagnosis. Med Image Anal. 2016; 33:94–97. 10.1016/j.media.2016.06.032. - PubMed
    1. Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, Lee AM, Aung N, Lukaschuk E, Sanghvi MM, Zemrak F, Fung K, Paiva JM, Carapella V, Kim YJ, Suzuki H, Kainz B, Matthews PM, Petersen SE, Piechnik SK, Neubauer S, Glocker B, Rueckert D. Human-level cmr image analysis with deep fully convolutional networks. http://arxiv.org/abs/1710.09289v3.
    1. Crum WR, Camara O, Hill DLG. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Trans Med Imaging. 2006; 25(11):1451–61. 10.1109/TMI.2006.880587. - PubMed

Publication types