Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar;29(3):771-80.
doi: 10.1109/TMI.2009.2036011.

Estimation of inferential uncertainty in assessing expert segmentation performance from STAPLE

Affiliations

Estimation of inferential uncertainty in assessing expert segmentation performance from STAPLE

Olivier Commowick et al. IEEE Trans Med Imaging. 2010 Mar.

Abstract

The evaluation of the quality of segmentations of an image, and the assessment of intra- and inter-expert variability in segmentation performance, has long been recognized as a difficult task. For a segmentation validation task, it may be effective to compare the results of an automatic segmentation algorithm to multiple expert segmentations. Recently an expectation-maximization (EM) algorithm for simultaneous truth and performance level estimation (STAPLE) was developed to this end to compute both an estimate of the reference standard segmentation and performance parameters from a set of segmentations of an image. The performance is characterized by the rate of detection of each segmentation label by each expert in comparison to the estimated reference standard. This previous work provides estimates of performance parameters,but does not provide any information regarding the uncertainty of the estimated values. An estimate of this inferential uncertainty, if available, would allow the estimation of confidence intervals for the values of the parameters. This would facilitate the interpretation of the performance of segmentation generators and help determine if sufficient data size and number of segmentations have been obtained to precisely characterize the performance parameters. We present a new algorithm to estimate the inferential uncertainty of the performance parameters for binary and multi-category segmentations. It is derived for the special case of the STAPLE algorithm based on established theory for general purpose covariance matrix estimation for EM algorithms. The bounds on the performance parameters are estimated by the computation of the observed information matrix.We use this algorithm to study the bounds on performance parameters estimates from simulated images with specified performance parameters, and from interactive segmentations of neonatal brain MRIs. We demonstrate that confidence intervals for expert segmentation performance parameters can be estimated with our algorithm. We investigate the influence of the number of experts and of the segmented data size on these bounds, showing that it is possible to determine the number of image segmentations and the size of images necessary to achieve a chosen level of accuracy in segmentation performance assessment.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Illustration of the confidence interval on one parameter
We aim at computing the lower (LB) and upper bound (UB) for each parameter θ^jll estimated by STAPLE. In the case of a known ground truth (experiments on simulated data), this range can be compared to the true value θjl’l to check for the accuracy of parameter estimation in STAPLE.
Fig. 2
Fig. 2. Database of Simulated Images
Simulated images used for the validation of our confidence intervals estimation method : (a): original segmentation, (b): simulated segmentation of group 1 (sensitivity: 0.7, specificity: 0.8), (c): simulated segmentation of group 2 (sensitivity: 0.9, specificity: 0.9).
Fig. 3
Fig. 3. Illustration of one image from the database
Coronal slice of (a) newborn T1 MRI and (b-f) its repeated manual segmentation in 5 classes done by one expert (cortical gray matter - grey, sub-cortical gray matter - white, unmyelinated white matter - red, myelinated white matter - orange - and CSF - blue). Other images in the database were similar to this specific example.
Fig. 4
Fig. 4. Confidence bounds of the sensitivity and specificity parameters
Expert parameters and their confidence intervals ((a, c): Sensitivity, (b, d): Specificity) for the white matter segmentation (a, b) and the gray matter segmentation (c, d). Each child segmentations were treated separately, each column for each child represents an expert’s segmentation. The results on five datasets (each column of each graph) show that the confidence intervals of the estimated sensitivities and specificities are very tight.
Fig. 5
Fig. 5. Influence of the image dimension on confidence bounds of the parameters
95 % confidence intervals on the estimated values of the sensitivity (a) and specificity (b) parameters for the image at original size (blue), subsampled once (red), and subsampled twice (green). These show a decrease in the confidence in the estimated parameters as the image is subsampled, reflecting that the confidence in the estimates decreases when the amount of available data is reduced.
Fig. 6
Fig. 6. Influence of the number of experts on the confidence intervals of the performance parameters
Average relative confidence intervals values (in percent of the average performance parameter) as a function of the number of experts used in STAPLE. For each number of experts, all combinations of K experts among the 15 available were used to compute the average. The three curves show the results using: the whole images (blue), half of the images (red), and the upper left quarter of the images (green) to compute the STAPLE performance estimates.

Similar articles

Cited by

References

    1. Huttenlocher D, Klanderman D, Rucklige A. Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1993 Sep;vol. 15(no. 9):850–863.
    1. Chalana V, Kim Y. A methodology for evaluation of boundary detection algorithms on medical images. IEEE Transactions on Medical Imaging. 1997;vol. 16(no. 5):642–652. - PubMed
    1. Dice L. Measures of the amount of ecologic association between species. Ecology. 1945;vol. 26(no. 3):297–302.
    1. Jaccard P. The distribution of flora in the alpine zone. New Phytologist. 1912;vol. 11:37–50.
    1. Zou KH, Warfield SK, Bharatha A, Tempany CMC, Tempany C, Kaus MR, Haker SJ, Wells WM, Jolesz FA, Kikinis R. Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol. 2004 Feb;vol. 11(no. 2):178–89. - PMC - PubMed

Publication types