Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb:12467:124670R.
doi: 10.1117/12.2647894. Epub 2023 Apr 3.

Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET

Affiliations

Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET

Ziping Liu et al. Proc SPIE Int Soc Opt Eng. 2023 Feb.

Abstract

Artificial intelligence (AI)-based methods are showing substantial promise in segmenting oncologic positron emission tomography (PET) images. For clinical translation of these methods, assessing their performance on clinically relevant tasks is important. However, these methods are typically evaluated using metrics that may not correlate with the task performance. One such widely used metric is the Dice score, a figure of merit that measures the spatial overlap between the estimated segmentation and a reference standard (e.g., manual segmentation). In this work, we investigated whether evaluating AI-based segmentation methods using Dice scores yields a similar interpretation as evaluation on the clinical tasks of quantifying metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumor from PET images of patients with non-small cell lung cancer. The investigation was conducted via a retrospective analysis with the ECOG-ACRIN 6668/RTOG 0235 multi-center clinical trial data. Specifically, we evaluated different structures of a commonly used AI-based segmentation method using both Dice scores and the accuracy in quantifying MTV/TLG. Our results show that evaluation using Dice scores can lead to findings that are inconsistent with evaluation using the task-based figure of merit. Thus, our study motivates the need for objective task-based evaluation of AI-based segmentation methods for quantitative PET.

Keywords: Task-based evaluation; artificial intelligence; positron emission tomography; quantification; segmentation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Architecture of the considered commonly used U-net-based convolutional neural network model.
Figure 2.
Figure 2.
Task-based evaluation of the impact of network depth on the performance of the considered CNN model. (A) Performance quantified using the task-agnostic Dice scores and task-based figure of merit of absolute ensemble normalized bias; (B) Violin plot showing the distributions of the Dice scores and normalized error in estimated MTV/TLG for a shallower and deeper network, respectively.
Figure 3.
Figure 3.
Segmentations of the primary tumor in a representative patient obtained with a shallower and a deeper network. The segmentations for each of the slices that contained the tumor are shown. For this patient, the Dice scores were very similar between the shallower and deeper network, even though the errors in estimating MTV/TLG were very different. We also note that visually, the segmentations are different for the shallower vs. deeper networks.

Update of

References

    1. Chen HH, Chiu N-T, Su W-C, Guo H-R, and Lee B-F, “Prognostic value of whole-body total lesion glycolysis at pretreatment FDG PET/CT in non–small cell lung cancer,” Radiology 264(2), 559–566 (2012). - PubMed
    1. Ohri N, Duan F, Machtay M, Gorelick JJ, Snyder BS, Alavi A, Siegel BA, Johnson DW, Bradley JD, DeNittis A, et al., “Pretreatment FDG-PET metrics in stage III non–small cell lung cancer: ACRIN 6668/RTOG 0235,” J. Natl. Cancer Inst 107(4) (2015). - PMC - PubMed
    1. Mena E, Sheikhbahaei S, Taghipour M, Jha AK, Vicente E, Xiao J, and Subramaniam RM, “18F-FDG PET/CT Metabolic tumor volume and intra-tumoral heterogeneity in pancreatic adenocarcinomas: Impact of dual-time-point and segmentation methods,” Clin. Nucl. Med 42(1), e16 (2017). - PMC - PubMed
    1. Lee JW and Lee SM, “Radiomics in oncological PET/CT: clinical applications,” Nucl. Med. Mol. Imaging 52(3), 170–189 (2018). - PMC - PubMed
    1. Zhao X, Li L, Lu W, and Tan S, “Tumor co-segmentation in PET/CT using multi-modality fully convolutional neural network,” Phys. Med. Biol 64(1), 015011 (2018). - PMC - PubMed

LinkOut - more resources