Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct;18(7):1070-81.
doi: 10.1016/j.media.2014.06.005. Epub 2014 Jul 4.

Hierarchical performance estimation in the statistical label fusion framework

Affiliations

Hierarchical performance estimation in the statistical label fusion framework

Andrew J Asman et al. Med Image Anal. 2014 Oct.

Abstract

Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally - fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. The proposed approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, the primary contributions of this manuscript are: (1) we provide a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) confusion matrices for each rater, (2) we highlight the amenability of the proposed hierarchical formulation to many of the state-of-the-art advancements to the statistical fusion framework, and (3) we demonstrate statistically significant improvement on both simulated and empirical data. Specifically, both theoretically and empirically, we show that the proposed hierarchical performance model provides substantial and significant accuracy benefits when applied to two disparate multi-atlas segmentation tasks: (1) 133 label whole-brain anatomy on structural MR, and (2) orbital anatomy on CT.

Keywords: Hierarchical segmentation; Label fusion; Multi-atlas segmentation; Rater performance models; STAPLE.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hierarchical representation of rater performance. Volumetric renderings of the brain anatomy at the various levels are shown. At each level, the rater performance is quantified using a representative confusion matrix. Each level is then unified through a complete hierarchical performance model.
Figure 2
Figure 2
Motivating simulation data and results. A simple 2D simulated dataset was constructed with observations using a boundary error model (A). Given a pre-defined hierarchical structure (B), the accuracy of all possible unique hierarchies via label permutation was quantified (C). Representative estimates using the “logical” (D), “best” (E), and “worst” (F) hierarchies are also presented.
Figure 3
Figure 3
Mean accuracy of the various benchmarks and their corresponding hierarchical implementations for both the affine and the non-rigid registration frameworks. The accuracy of a majority vote (MV), locally-weighted vote (LWV), and joint label fusion (JLF) are presented to provide a reference baseline. The hierarchical implementations for STAPLE, Spatial STAPLE (SS), Non-Local STAPLE (NLS), and Non-Local Spatial STAPLE (NLSS) provide consistent and statistically significant improvement over their non-hierarchical counterparts.
Figure 4
Figure 4
Per-label accuracy for non-cortical labels for hierarchical implementations of NLS and NLSS using the affine registration framework. The hierarchical reformulations provide substantial and significant improvement for many of the considered labels. A “*” over the hierarchical NLS or NLSS results indicate statistically significant improvement over the non-hierarchical implementation.
Figure 5
Figure 5
Per-label accuracy for non-cortical labels for hierarchical implementations of NLS and NLSS using the non-rigid registration framework. As with the affine-only registration framework (Figure 4), the hierarchical implementations provide substantial and significant improvement for many of the considered labels. A “*” over the hierarchical NLS or NLSS results indicate statistically significant improvement over the non-hierarchical implementation.
Figure 6
Figure 6
Mean per-label accuracy improvement for cortical labels using the hierarchical implementations of NLS and NLSS for the both of the considered registration frameworks. Particularly for the affine registration framework, the hierarchical reformulations provide substantial improvement in mean DSC accuracy for many of the cortical labels.
Figure 7
Figure 7
Qualitative improvement exhibited by several state-of-the-art statistical fusion algorithms with the reformulated hierarchical performance model for the affine registration framework. For each of the considered statistical fusion algorithms we see substantial visual improvement for many of the considered labels. In particular, there appears to be marked improvement in the quality of the lateral ventricle labels and many of the cortical labels. The ellipses highlight regions exhibiting particular qualitative improvement.
Figure 8
Figure 8
Empirical evaluation of the impact of the various logical hierarchical representations on STAPLE applied to multi-atlas segmentation of orbital anatomy on CT. The three considered logical hierarchical representations are shown in (A). The quantitative comparison (B) demonstrates that Hierarchical STAPLE provides significant improvement using both the “ideal” performance parameters, and the parameters estimated via EM. The quantitative accuracy benefits support the qualitative improvement shown in (C).

References

    1. Akhondi-Asl A, Warfield S. Simultaneous Truth and Performance Level Estimation Through Fusion of Probabilistic Segmentations. IEEE transactions on medical imaging. 2013 - PMC - PubMed
    1. Aljabar P, Heckemann R, Hammers A, Hajnal J, Rueckert D. Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy. Neuroimage. 2009;46:726–738. - PubMed
    1. Artaechevarria X, Muñoz-Barrutia A, Ortiz-de-Solorzano C. Combination strategies in multi-atlas image segmentation: Application to brain MR data. IEEE Trans. Med. Imaging. 2009;28:1266–1277. - PubMed
    1. Asman A, Landman B. Information Processing in Medical Imaging (IPMI) Vol. 6801. Springer; 2011. Characterizing spatially varying performance to improve multi-atlas multi-label segmentation; pp. 85–96. - PMC - PubMed
    1. Asman AJ, Dagley AS, Landman BA. Statistical label fusion with hierarchical performance models. SPIE Medical Imaging. San Diego, CA.: 2014. - PMC - PubMed

Publication types