. 2019 Mar 14;21(1):18.

doi: 10.1186/s12968-019-0523-x.

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Robert Robinson¹, Vanya V Valindria², Wenjia Bai², Ozan Oktay², Bernhard Kainz², Hideaki Suzuki³, Mihir M Sanghvi^{4

5}, Nay Aung^{4

5}, José Miguel Paiva⁴, Filip Zemrak^{4

5}, Kenneth Fung^{4

5}, Elena Lukaschuk⁶, Aaron M Lee^{4

5}, Valentina Carapella⁶, Young Jin Kim^{6

7}, Stefan K Piechnik⁶, Stefan Neubauer⁶, Steffen E Petersen^{4

5}, Chris Page⁸, Paul M Matthews^{3

9}, Daniel Rueckert², Ben Glocker²

Affiliations

¹ Biomedical Image Analysis Group, Department of Computing, Imperial College London, Queen's Gate, London, SW7 2AZ, UK. r.robinson16@imperial.ac.uk.
² Biomedical Image Analysis Group, Department of Computing, Imperial College London, Queen's Gate, London, SW7 2AZ, UK.
³ Division of Brain Sciences, Dept. of Medicine, Imperial College London, Queen's Gate, London, SW7 2AZ, UK.
⁴ William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
⁵ Barts Heart Centre, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
⁶ Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DU, UK.
⁷ Department of Radiology, Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea.
⁸ GlaxoSmithKline Research and Development, Stockley Park, Uxbridge, UB11 1BT, UK.
⁹ UK Dementia Research Institute, Imperial College London, Queen's Drive, London, SW7 2AZ, UK.

PMID: 30866968
PMCID: PMC6416857
DOI: 10.1186/s12968-019-0523-x

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Robert Robinson et al. J Cardiovasc Magn Reson. 2019.

. 2019 Mar 14;21(1):18.

doi: 10.1186/s12968-019-0523-x.

Authors

Affiliations

¹ Biomedical Image Analysis Group, Department of Computing, Imperial College London, Queen's Gate, London, SW7 2AZ, UK. r.robinson16@imperial.ac.uk.
² Biomedical Image Analysis Group, Department of Computing, Imperial College London, Queen's Gate, London, SW7 2AZ, UK.
³ Division of Brain Sciences, Dept. of Medicine, Imperial College London, Queen's Gate, London, SW7 2AZ, UK.
⁴ William Harvey Research Institute, NIHR Barts Biomedical Research Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
⁵ Barts Heart Centre, Barts Health NHS Trust, West Smithfield, London, EC1A 7BE, UK.
⁶ Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DU, UK.
⁷ Department of Radiology, Severance Hospital, Yonsei University College of Medicine, Seoul, South Korea.
⁸ GlaxoSmithKline Research and Development, Stockley Park, Uxbridge, UB11 1BT, UK.
⁹ UK Dementia Research Institute, Imperial College London, Queen's Drive, London, SW7 2AZ, UK.

PMID: 30866968
PMCID: PMC6416857
DOI: 10.1186/s12968-019-0523-x

Abstract

Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools such as image segmentation methods are employed to derive quantitative measures or biomarkers for further analyses. Manual inspection and visual QC of each segmentation result is not feasible at large scale. However, it is important to be able to automatically detect when a segmentation method fails in order to avoid inclusion of wrong measurements into subsequent analyses which could otherwise lead to incorrect conclusions.

Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4800 cardiovascular magnetic resonance (CMR) scans. We then apply our method to a large cohort of 7250 CMR on which we have performed manual QC.

Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using the predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4800 scans for which manual segmentations were available. We mimic real-world application of the method on 7250 CMR where we show good agreement between predicted quality metrics and manual visual QC scores.

Conclusions: We show that Reverse classification accuracy has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.

Keywords: Automatic quality control; Population imaging; Segmentation.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The UKBB has approval from the North West Research Ethics Committee (REC reference: 11/NW/0382). All participants have given written informed consent.

Consent for publication

Not applicable.

Competing interests

Steffen E. Petersen provides consultancy to Circle Cardiovascular Imaging Inc. (Calgary, Alberta, Canada). Ben Glocker receives research funding from HeartFlow Inc. (Redwood City, CA, USA).

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Reverse Classification Accuracy - Single-atlas Registration Classifier. Reverse Classification Accuracy (RCA), with single-atlas registration classifier, as applied in our study. A set of reference images are first registered to the test-image before the resulting transformations are used to warp the corresponding reference segmentations. Dice Similarity Coefficient (DSC) is calculated between the warped segmentations and the test-segmentation with the maximum DSC taken as a proxy for the accuracy of the test-segmentation. Note that in practice, the ground truth test-segmentation is absent. Images and segmentation annotated as referred to in the text

**Fig. 2**
Example Results from RCA. Examples of RCA results on one proposed segmentation. The panels in the top row show (left to right) the MRI scan, the predicted segmentation, an overlay and the manual annotation. The array below shows a subset of the 100 reference images ordered by Dice similarity coefficient (DSC) and equally spaced from highest to lowest DSC. The array shows (left) the reference image, (middle) its ground truth segmentation and (right) the test-segmentation from the upper row which has been warped to the reference image. The real DSC between each reference image and warped segmentation is shown for each pair. RCA-predicted and real GT-calculated DSCs are shown for the whole-heart binary classification case at the top alongside the metrics for each individual class in the segmentation

**Fig. 3**
RCA Validation on 400 cardiac MRI. 400 cardiac MRI segmentations were generated with a Random Forest classifier. 500 trees and depths in the range [5, 40] were used to simulate various degrees of segmentation quality. RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). Ground truth for the scans is known so real metrics are also calculated. All calculations on the whole-heart binary classification task. We report low mean absolute error (MAE) for all metrics and 99% binary classification accuracy (TPR = 0.98, FPR = 0.00) with a DSC threshold of 0.70. High accuracy for individual segmentation classes. Absolute error for each image is shown for each metric. We note increasing error with decreasing quality of segmentation based on the real metric score

**Fig. 4**
Validation on 4805 Random Forest segmentations of UKBB Imaging Study with Ground Truth. 4,805 cardiac MRI were segmented with a Random Forest classifier. 500 trees and depths in the range [5 40] were used to simulate various degrees of segmentation quality. Manual contours were available through Biobank Application 2964. RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). All calculations on the whole-heart binary classification task. We report low mean absolute error (MAE) for all metrics and 95% binary classification accuracy (TPR = 0.97 and FPR = 0.15) with a DSC threshold of 0.70. High accuracy for individual segmentation classes

**Fig. 5**
Extensive Reverse Classification Accuracy Validation on 900 UKBB Segmentations. Convolutional neural network (CNN) segmentation as in Bai et al. [4]. Manual contours were available through Biobank Application 2964. RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). All calculations for the binary quality classification task on (top) ’Whole Heart’ average and (bottom) Left Ventricular Myocardium. We report low mean absolute error (MAE) for all metrics and 99.8% binary classification accuracy (TPR = 1.00 and FPR = 0.00) with a DSC threshold of 0.70

**Fig. 6**
RCA Application on 7250 Cardiac MRI segmentations of UKBB Imaging Study. 7,250 cardiac MRI segmentations generated with a multi-atlas segmentation approach [18]. Manual QC scores given in the range [0 6] (i.e. [0 2] for each of basal, mid and apical slices). RCA with single-atlas classifier was used to predict the Dice Similarity Coefficient (DSC), mean surface distance (MSD), root mean-squared surface distance (RMS) and Hausdorff distance (HD). All calculations on the LV Myocardium binary classification task. We show correlation in all metrics. Examples show: a) and b) agreement between low predicted DSC and low manual QC score, c) successful automated identification of poor segmentation with low predicted DSC despite high manual QC score and d) agreement between high predicted DSC and high manual QC score. Inserts in top row display extended range of y-axis

**Fig. 7**
Investigating the Effect of Reference Set Size on Prediction Accuracy. 4,805 automated segmentations from Experiment B were processed with Reverse Classification Accuracy (RCA) using differing numbers of reference images. Random subsets of 10, 15, 35, 50, 65 and 75 reference images were taken from the full set of 100 available reference images. Five random runs were performed to obtain error bars for each setting. Average prediction accuracy increases with increasing number of reference images and the variance between runs also decreases

See this image and copyright information in PMC

References

1. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015; 12(3):1–10. 10.1371/journal.pmed.1001779. - PMC - PubMed
1. Shariff A, Kangas J, Coelho LP, Quinn S, Murphy RF. Automated Image Analysis for High-Content Screening and Analysis. J Biomol Screen. 2010; 15(7):726–34. 10.1177/1087057110370894. - PubMed
1. de Bruijne M. Machine learning approaches in medical image analysis: From detection to diagnosis. Med Image Anal. 2016; 33:94–97. 10.1016/j.media.2016.06.032. - PubMed
1. Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, Lee AM, Aung N, Lukaschuk E, Sanghvi MM, Zemrak F, Fung K, Paiva JM, Carapella V, Kim YJ, Suzuki H, Kainz B, Matthews PM, Petersen SE, Piechnik SK, Neubauer S, Glocker B, Rueckert D. Human-level cmr image analysis with deep fully convolutional networks. http://arxiv.org/abs/1710.09289v3.
1. Crum WR, Camara O, Hill DLG. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Trans Med Imaging. 2006; 25(11):1451–61. 10.1109/TMI.2006.880587. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Affiliations

Automated quality control in image segmentation: application to the UK Biobank cardiovascular magnetic resonance imaging study

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials