Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 Feb;44(2):479-496.
doi: 10.1002/mp.12041.

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data

Affiliations
Comparative Study

Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data

Reinhard R Beichel et al. Med Phys. 2017 Feb.

Abstract

Purpose: Radiomics utilizes a large number of image-derived features for quantifying tumor characteristics that can in turn be correlated with response and prognosis. Unfortunately, extraction and analysis of such image-based features is subject to measurement variability and bias. The challenge for radiomics is particularly acute in Positron Emission Tomography (PET) where limited resolution, a high noise component related to the limited stochastic nature of the raw data, and the wide variety of reconstruction options confound quantitative feature metrics. Extracted feature quality is also affected by tumor segmentation methods used to define regions over which to calculate features, making it challenging to produce consistent radiomics analysis results across multiple institutions that use different segmentation algorithms in their PET image analysis. Understanding each element contributing to these inconsistencies in quantitative image feature and metric generation is paramount for ultimate utilization of these methods in multi-institutional trials and clinical oncology decision making.

Methods: To assess segmentation quality and consistency at the multi-institutional level, we conducted a study of seven institutional members of the National Cancer Institute Quantitative Imaging Network. For the study, members were asked to segment a common set of phantom PET scans acquired over a range of imaging conditions as well as a second set of head and neck cancer (HNC) PET scans. Segmentations were generated at each institution using their preferred approach. In addition, participants were asked to repeat segmentations with a time interval between initial and repeat segmentation. This procedure resulted in overall 806 phantom insert and 641 lesion segmentations. Subsequently, the volume was computed from the segmentations and compared to the corresponding reference volume by means of statistical analysis.

Results: On the two test sets (phantom and HNC PET scans), the performance of the seven segmentation approaches was as follows. On the phantom test set, the mean relative volume errors ranged from 29.9 to 87.8% of the ground truth reference volumes, and the repeat difference for each institution ranged between -36.4 to 39.9%. On the HNC test set, the mean relative volume error ranged between -50.5 to 701.5%, and the repeat difference for each institution ranged between -37.7 to 31.5%. In addition, performance measures per phantom insert/lesion size categories are given in the paper. On phantom data, regression analysis resulted in coefficient of variation (CV) components of 42.5% for scanners, 26.8% for institutional approaches, 21.1% for repeated segmentations, 14.3% for relative contrasts, 5.3% for count statistics (acquisition times), and 0.0% for repeated scans. Analysis showed that the CV components for approaches and repeated segmentations were significantly larger on the HNC test set with increases by 112.7% and 102.4%, respectively.

Conclusion: Analysis results underline the importance of PET scanner reconstruction harmonization and imaging protocol standardization for quantification of lesion volumes. In addition, to enable a distributed multi-site analysis of FDG PET images, harmonization of analysis approaches and operator training in combination with highly automated segmentation methods seems to be advisable. Future work will focus on quantifying the impact of segmentation variation on radiomics system performance.

Keywords: FDG PET; head and neck cancer; multi-site performance analysis; phantom; radiomics; segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest to report.

Figures

Figure 1
Figure 1
Modified NEMA IEC Body Phantom with spherical and ellipsoid inserts and corresponding naming scheme. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 2
Figure 2
Example of indicator images. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 3
Figure 3
Distributions of measured volumes by reference volumes. The gray diamonds (phantom) and line (HNC) represent the reference volumes. Note that phantom and HNC plots have different scales.
Figure 4
Figure 4
Approach‐specific relative mean error as a function of reference volume. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 5
Figure 5
Distributions of relative errors in volume measurements by approach.
Figure 6
Figure 6
Distributions of relative repeat errors in volume measurements by approach.
Figure 7
Figure 7
Examples of segmentation results for a primary cancer site, which is part of test set HNC. (a–g) Segmentations generated with approaches 1 to 7. (h) One example of the six manual reference segmentations. The corresponding indicator image is given in Fig. 2(a). For each segmentation approach, the relative volume error Ve, Dice coefficient D, and mean unsigned distance error de is provided. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 8
Figure 8
Examples of segmentation results for a hot lymph node, which is part of test set HNC. (a–g) Segmentations generated with approaches 1 to 7. (h) One example of the six manual reference segmentations. The corresponding indicator image is given in Fig. 2(b). For each segmentation approach, the relative volume error Ve, Dice coefficient D, and mean unsigned distance error de is provided. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 9
Figure 9
Relative mean errors from regression modeling of the phantom and HNC data.
Figure 10
Figure 10
Comparison of segmentation performance of methods 1 to 7 on (a) phantom and (b) HNC data. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 11
Figure 11
Overview of overall segmentation performance, comparing Methods 1 to 7. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 12
Figure 12
Comparison of the absolute mean relative volume error of approaches 1 to 7 against the corresponding mean Dice coefficient (a) and mean unsigned distance error (b). [Color figure can be viewed at wileyonlinelibrary.com]
Figure 13
Figure 13
Boxplots of measured Dice coefficients (a) and unsigned distance errors (b) as well as corresponding repeat differences (c and d).

Similar articles

Cited by

References

    1. Cook GJR, Siddique M, Taylor BP, et al. Radiomics in PET: principles and applications. Clinical and Translational Imaging. 2014;2:269–276.
    1. Kumar V, Gu Y, Basu S, et al. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30:1234–1248. - PMC - PubMed
    1. Alluri KC, Tahari AK, Wahl RL, et al. Prognostic value of FDG PET metabolic tumor volume in human papillomavirus‐positive stage III and IV oropharyngeal squamous cell carcinoma. AJR Am J Roentgenol 2014;203:897–903. - PMC - PubMed
    1. Sridhar P, Mercier G, Tan J, et al. FDG PET metabolic tumor volume segmentation and pathologic volume of primary human solid tumors. AJR Am J Roentgenol 2014;202(5):1114–1119. - PubMed
    1. Dibble EH, Alvarez AC, Truong MT, et al. 18F‐FDG metabolic tumor volume and total glycolytic activity of oral cavity and oropharyngeal squamous cell cancer: adding value to clinical staging. J Nucl Med. 2012;53:709–715. - PubMed

MeSH terms

Substances