. 2010 Mar;29(3):771-80.

doi: 10.1109/TMI.2009.2036011.

Estimation of inferential uncertainty in assessing expert segmentation performance from STAPLE

Olivier Commowick¹, Simon K Warfield

Affiliations

PMID: 20199913
PMCID: PMC3183509
DOI: 10.1109/TMI.2009.2036011

Estimation of inferential uncertainty in assessing expert segmentation performance from STAPLE

Olivier Commowick et al. IEEE Trans Med Imaging. 2010 Mar.

. 2010 Mar;29(3):771-80.

doi: 10.1109/TMI.2009.2036011.

Authors

Olivier Commowick¹, Simon K Warfield

Affiliation

¹ Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Boston, MA 02115, USA. olivier.commowick@childrens.harvard.edu

PMID: 20199913
PMCID: PMC3183509
DOI: 10.1109/TMI.2009.2036011

Abstract

The evaluation of the quality of segmentations of an image, and the assessment of intra- and inter-expert variability in segmentation performance, has long been recognized as a difficult task. For a segmentation validation task, it may be effective to compare the results of an automatic segmentation algorithm to multiple expert segmentations. Recently an expectation-maximization (EM) algorithm for simultaneous truth and performance level estimation (STAPLE) was developed to this end to compute both an estimate of the reference standard segmentation and performance parameters from a set of segmentations of an image. The performance is characterized by the rate of detection of each segmentation label by each expert in comparison to the estimated reference standard. This previous work provides estimates of performance parameters,but does not provide any information regarding the uncertainty of the estimated values. An estimate of this inferential uncertainty, if available, would allow the estimation of confidence intervals for the values of the parameters. This would facilitate the interpretation of the performance of segmentation generators and help determine if sufficient data size and number of segmentations have been obtained to precisely characterize the performance parameters. We present a new algorithm to estimate the inferential uncertainty of the performance parameters for binary and multi-category segmentations. It is derived for the special case of the STAPLE algorithm based on established theory for general purpose covariance matrix estimation for EM algorithms. The bounds on the performance parameters are estimated by the computation of the observed information matrix.We use this algorithm to study the bounds on performance parameters estimates from simulated images with specified performance parameters, and from interactive segmentations of neonatal brain MRIs. We demonstrate that confidence intervals for expert segmentation performance parameters can be estimated with our algorithm. We investigate the influence of the number of experts and of the segmented data size on these bounds, showing that it is possible to determine the number of image segmentations and the size of images necessary to achieve a chosen level of accuracy in segmentation performance assessment.

PubMed Disclaimer

Figures

**Fig. 1. Illustration of the confidence interval on one parameter**
We aim at computing the lower (LB) and upper bound (UB) for each parameter ${\hat{θ}}_{j l^{'} l}$ estimated by STAPLE. In the case of a known ground truth (experiments on simulated data), this range can be compared to the true value *θ_jl’l* to check for the accuracy of parameter estimation in STAPLE.

**Fig. 2. Database of Simulated Images**
Simulated images used for the validation of our confidence intervals estimation method : (a): original segmentation, (b): simulated segmentation of group 1 (sensitivity: 0.7, specificity: 0.8), (c): simulated segmentation of group 2 (sensitivity: 0.9, specificity: 0.9).

**Fig. 3. Illustration of one image from the database**
Coronal slice of (a) newborn T1 MRI and (b-f) its repeated manual segmentation in 5 classes done by one expert (cortical gray matter - grey, sub-cortical gray matter - white, unmyelinated white matter - red, myelinated white matter - orange - and CSF - blue). Other images in the database were similar to this specific example.

**Fig. 4. Confidence bounds of the sensitivity and specificity parameters**
Expert parameters and their confidence intervals ((a, c): Sensitivity, (b, d): Specificity) for the white matter segmentation (a, b) and the gray matter segmentation (c, d). Each child segmentations were treated separately, each column for each child represents an expert’s segmentation. The results on five datasets (each column of each graph) show that the confidence intervals of the estimated sensitivities and specificities are very tight.

**Fig. 5. Influence of the image dimension on confidence bounds of the parameters**
95 % confidence intervals on the estimated values of the sensitivity (a) and specificity (b) parameters for the image at original size (blue), subsampled once (red), and subsampled twice (green). These show a decrease in the confidence in the estimated parameters as the image is subsampled, reflecting that the confidence in the estimates decreases when the amount of available data is reduced.

**Fig. 6. Influence of the number of experts on the confidence intervals of the performance parameters**
Average relative confidence intervals values (in percent of the average performance parameter) as a function of the number of experts used in STAPLE. For each number of experts, all combinations of K experts among the 15 available were used to compute the average. The three curves show the results using: the whole images (blue), half of the images (red), and the upper left quarter of the images (green) to compute the STAPLE performance estimates.

See this image and copyright information in PMC

Cited by

The effect of imaging modality (magnetic resonance imaging vs. computed tomography) and patient position (supine vs. prone) on target and organ at risk doses in partial breast irradiation.
Brown E, Dundas K, Surjan Y, Miller D, Lim K, Boxer M, Ahern V, Papadatos G, Batumalai V, Harvey J, Lee D, Delaney GP, Holloway L. Brown E, et al. J Med Radiat Sci. 2021 Jun;68(2):157-166. doi: 10.1002/jmrs.453. Epub 2020 Dec 7. J Med Radiat Sci. 2021. PMID: 33283982 Free PMC article.
Validating retinal fundus image analysis algorithms: issues and a proposal.
Trucco E, Ruggeri A, Karnowski T, Giancardo L, Chaum E, Hubschman JP, Al-Diri B, Cheung CY, Wong D, Abràmoff M, Lim G, Kumar D, Burlina P, Bressler NM, Jelinek HF, Meriaudeau F, Quellec G, Macgillivray T, Dhillon B. Trucco E, et al. Invest Ophthalmol Vis Sci. 2013 May 1;54(5):3546-59. doi: 10.1167/iovs.12-10347. Invest Ophthalmol Vis Sci. 2013. PMID: 23794433 Free PMC article.
A collaborative resource to build consensus for automated left ventricular segmentation of cardiac MR images.
Suinesiaputra A, Cowan BR, Al-Agamy AO, Elattar MA, Ayache N, Fahmy AS, Khalifa AM, Medrano-Gracia P, Jolly MP, Kadish AH, Lee DC, Margeta J, Warfield SK, Young AA. Suinesiaputra A, et al. Med Image Anal. 2014 Jan;18(1):50-62. doi: 10.1016/j.media.2013.09.001. Epub 2013 Sep 13. Med Image Anal. 2014. PMID: 24091241 Free PMC article.
Comparative performance evaluation of automated segmentation methods of hippocampus from magnetic resonance images of temporal lobe epilepsy patients.
Hosseini MP, Nazem-Zadeh MR, Pompili D, Jafari-Khouzani K, Elisevich K, Soltanian-Zadeh H. Hosseini MP, et al. Med Phys. 2016 Jan;43(1):538. doi: 10.1118/1.4938411. Med Phys. 2016. PMID: 26745947 Free PMC article.
Two-dimensional segmentation fusion tool: an extensible, free-to-use, user-friendly tool for combining different bidimensional segmentations.
Piccinini F, Drudi L, Pyun JC, Lee M, Kwak B, Ku B, Carbonaro A, Martinelli G, Castellani G. Piccinini F, et al. Front Bioeng Biotechnol. 2024 Jan 31;12:1339723. doi: 10.3389/fbioe.2024.1339723. eCollection 2024. Front Bioeng Biotechnol. 2024. PMID: 38357706 Free PMC article.

See all "Cited by" articles

References

1. Huttenlocher D, Klanderman D, Rucklige A. Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1993 Sep;vol. 15(no. 9):850–863.
1. Chalana V, Kim Y. A methodology for evaluation of boundary detection algorithms on medical images. IEEE Transactions on Medical Imaging. 1997;vol. 16(no. 5):642–652. - PubMed
1. Dice L. Measures of the amount of ecologic association between species. Ecology. 1945;vol. 26(no. 3):297–302.
1. Jaccard P. The distribution of flora in the alpine zone. New Phytologist. 1912;vol. 11:37–50.
1. Zou KH, Warfield SK, Bharatha A, Tempany CMC, Tempany C, Kaus MR, Haker SJ, Wells WM, Jolesz FA, Kikinis R. Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol. 2004 Feb;vol. 11(no. 2):178–89. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimation of inferential uncertainty in assessing expert segmentation performance from STAPLE

Affiliation

Estimation of inferential uncertainty in assessing expert segmentation performance from STAPLE

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical