Formulating spatially varying performance in the statistical fusion framework

Andrew J Asman¹, Bennett A Landman

Affiliations

PMID: 22438513
PMCID: PMC3368083
DOI: 10.1109/TMI.2012.2190992

Formulating spatially varying performance in the statistical fusion framework

Andrew J Asman et al. IEEE Trans Med Imaging. 2012 Jun.

. 2012 Jun;31(6):1326-36.

doi: 10.1109/TMI.2012.2190992. Epub 2012 Mar 15.

Authors

Andrew J Asman¹, Bennett A Landman

Affiliation

¹ Department of Electrical Engineering, Vanderbilt University, Nashville, TN 37235, USA. andrew.j.asman@vanderbilt.edu

PMID: 22438513
PMCID: PMC3368083
DOI: 10.1109/TMI.2012.2190992

Erratum in

IEEE Trans Med Imaging. 2012 Jul;31(7):1505

Abstract

To date, label fusion methods have primarily relied either on global [e.g., simultaneous truth and performance level estimation (STAPLE), globally weighted vote] or voxelwise (e.g., locally weighted vote) performance models. Optimality of the statistical fusion framework hinges upon the validity of the stochastic model of how a rater errs (i.e., the labeling process model). Hitherto, approaches have tended to focus on the extremes of potential models. Herein, we propose an extension to the STAPLE approach to seamlessly account for spatially varying performance by extending the performance level parameters to account for a smooth, voxelwise performance level field that is unique to each rater. This approach, Spatial STAPLE, provides significant improvements over state-of-the-art label fusion algorithms in both simulated and empirical data sets.

PubMed Disclaimer

Figures

**Fig 1**
Registered atlases exhibit spatially varying behavior. Representative slices from an expertly labeled MR brain image and CT head and neck image are shown in (A). Example registered atlases with their local performance can be seen in (B) and (C). Note that atlases exhibit smooth spatially varying performance that is unique to each atlas.

**Fig 2**
Demonstration of the Spatial STAPLE performance level field estimation procedure. An example expert segmentation can be seen in (A) with a collection of registered atlas observations seen in (B). Spatial STAPLE estimates local confusion matrices (C) in order to construct a whole-image estimate of performance that is smooth and spatially varying. The true performance for the atlas seen in (B) can be seen in (D) and the estimated performance from Spatial STAPLE presented in (E). Note that the intensity in (E) is an indication of average “performance” – i.e., the average diagonal element of Θ.

**Fig 3**
Results for the human rater simulation. The cross-sectional view of the truth model used in this simulation can be seen in (A). An example observation utilizing the boundary model of human behavior can be seen in (B). Note the fact that there exists a unique region where each rater is perfect in these observations. The corresponding label estimate from Majority Vote, STAPLE and Spatial STAPLE can be seen in (C)-(E), respectively. All displayed estimates were constructed using 10 raters. Lastly, an accuracy analysis can be seen in (F), note that with increasing volumes, Spatial STAPLE continually outperforms both STAPLE and Majority Vote.

**Fig 4**
Assessment of Spatial STAPLE sensitivity with respect to various model parameters for the human rater simulation. For each plot the percent improvement exhibited by Spatial STAPLE over STAPLE is assessed. The plot seen in (A) indicates the sensitivity of Spatial STAPLE to the impact of the global estimate of the performance level parameters. (B) indicates the sensitivity to the size of the pooling region (or window) associated with the voxelwise performance estimate. Lastly, plot (C) indicates the sensitivity to the amount of overlap between windows. The window overlap is a proxy for the number of seed points used in the estimation of the performance level field.

**Fig 5**
Qualitative results for the human rater cancer label experiment. The accuracy of majority vote, STAPLE and Spatial STAPLE are considered with varying numbers of observations per slice (or “coverages”). For all number of observations per slice, Spatial STAPLE exhibits statistically significant improvement over both majority vote and STAPLE.

**Fig 6**
Qualitative results for the human rater cancer labeling experiment. Four separate slices are shown, with the expert labels, majority vote, STAPLE and Spatial STAPLE presented for each example using 8 observations per slice. For all examples Spatial STAPLE is qualitatively superior to both majority vote and STAPLE. The arrows indicate areas of particular improvement exhibited by Spatial STAPLE.

**Fig 7**
Quantitative results for the simulation of meta-analysis fusion for whole brain segmentation. The presented results represent the accuracy of majority vote, STAPLE and Spatial STAPLE for all 26 labels across the 15 atlases considered in this experiment. Spatial STAPLE significantly outperforms the other algorithms for nearly all labels (excluding the left amygdala, pallidum and putamen).

**Fig 8**
Quantitative results for the segmented CT head and neck data. The mean DSC for all structures can be seen in (A). The DSC value for each of the individual algorithms can be seen in (B)-(E). Spatial STAPLE statistically outperforms locally weighted vote for all labels other than the thyroid despite the fact that Spatial STAPLE does not utilize intensity information.

**Fig 9**
Qualitative results for the segmented CT head and neck data. The average mean DSC improvement exhibited by Spatial STAPLE was approximately 0.01 DSC (Fig. 8A). Thus, it is important to assess whether or not this improvement is qualitatively visible. The truth labels can be seen in (A), with the corresponding majority vote, locally weighted vote, STAPLE and Spatial STAPLE estimates seen in (B) – (E).

See this image and copyright information in PMC

References

1. Crespo-Facorro B, Kim J, Andreasen N, O'Leary D, Wiser A, Bailey J, Harris G, Magnotta V. Human frontal cortex: an MRI-based parcellation method. NeuroImage. 1999;10:500–519. - PubMed
1. Tsang O, Gholipour A, Kehtarnavaz N, Gopinath K, Briggs R, Panahi I. Comparison of tissue segmentation algorithms in neuroimage analysis software tools. Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference. 20082008:3924–8. - PubMed
1. Fiez J, Damasio H, Grabowski T. Lesion segmentation and manual warping to a reference brain: intra- and interobserver reliability. Human Brain Mapping. 2000 Apr;9:192–211. - PMC - PubMed
1. Filippi M, Horsfield M, Bressi S, Martinelli V, Baratti C, Reganati P, Campi A, Miller D, Comi G. Intra-and inter-observer agreement of brain MRI lesion volume measurements in multiple sclerosis. Brain. 1995;118:1593. - PubMed
1. Ashton E, Takahashi C, Berg M, Goodman A, Totterman S, Ekholm S. Accuracy and reproducibility of manual and semiautomated quantification of MS lesions by MRI. Journal of Magnetic Resonance Imaging. 2003;17:300–308. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Formulating spatially varying performance in the statistical fusion framework

Affiliation

Formulating spatially varying performance in the statistical fusion framework

Authors

Affiliation

Erratum in

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical