Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET

Ziping Liu¹, Joyce C Mhlanga², Barry A Siegel^{2

3}, Abhinav K Jha^{1

2

3}

Affiliations

¹ Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA.
² Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA.
³ Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.

PMID: 37990707
PMCID: PMC10659582
DOI: 10.1117/12.2647894

Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET

Ziping Liu et al. Proc SPIE Int Soc Opt Eng. 2023 Feb.

. 2023 Feb:12467:124670R.

doi: 10.1117/12.2647894. Epub 2023 Apr 3.

Authors

Ziping Liu¹, Joyce C Mhlanga², Barry A Siegel^{2

3}, Abhinav K Jha^{1

2

3}

Affiliations

¹ Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA.
² Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA.
³ Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.

PMID: 37990707
PMCID: PMC10659582
DOI: 10.1117/12.2647894

Abstract

Artificial intelligence (AI)-based methods are showing substantial promise in segmenting oncologic positron emission tomography (PET) images. For clinical translation of these methods, assessing their performance on clinically relevant tasks is important. However, these methods are typically evaluated using metrics that may not correlate with the task performance. One such widely used metric is the Dice score, a figure of merit that measures the spatial overlap between the estimated segmentation and a reference standard (e.g., manual segmentation). In this work, we investigated whether evaluating AI-based segmentation methods using Dice scores yields a similar interpretation as evaluation on the clinical tasks of quantifying metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumor from PET images of patients with non-small cell lung cancer. The investigation was conducted via a retrospective analysis with the ECOG-ACRIN 6668/RTOG 0235 multi-center clinical trial data. Specifically, we evaluated different structures of a commonly used AI-based segmentation method using both Dice scores and the accuracy in quantifying MTV/TLG. Our results show that evaluation using Dice scores can lead to findings that are inconsistent with evaluation using the task-based figure of merit. Thus, our study motivates the need for objective task-based evaluation of AI-based segmentation methods for quantitative PET.

Keywords: Task-based evaluation; artificial intelligence; positron emission tomography; quantification; segmentation.

PubMed Disclaimer

Figures

**Figure 1.**
Architecture of the considered commonly used U-net-based convolutional neural network model.

**Figure 2.**
Task-based evaluation of the impact of network depth on the performance of the considered CNN model. (A) Performance quantified using the task-agnostic Dice scores and task-based figure of merit of absolute ensemble normalized bias; (B) Violin plot showing the distributions of the Dice scores and normalized error in estimated MTV/TLG for a shallower and deeper network, respectively.

**Figure 3.**
Segmentations of the primary tumor in a representative patient obtained with a shallower and a deeper network. The segmentations for each of the slices that contained the tumor are shown. For this patient, the Dice scores were very similar between the shallower and deeper network, even though the errors in estimating MTV/TLG were very different. We also note that visually, the segmentations are different for the shallower vs. deeper networks.

See this image and copyright information in PMC

Update of

Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET.
Liu Z, Mhlanga JC, Siegel BA, Jha AK. Liu Z, et al. ArXiv [Preprint]. 2023 Mar 1:arXiv:2303.00640v1. ArXiv. 2023. Update in: Proc SPIE Int Soc Opt Eng. 2023 Feb;12467:124670R. doi: 10.1117/12.2647894. PMID: 36911274 Free PMC article. Updated. Preprint.

References

1. Chen HH, Chiu N-T, Su W-C, Guo H-R, and Lee B-F, “Prognostic value of whole-body total lesion glycolysis at pretreatment FDG PET/CT in non–small cell lung cancer,” Radiology 264(2), 559–566 (2012). - PubMed
1. Ohri N, Duan F, Machtay M, Gorelick JJ, Snyder BS, Alavi A, Siegel BA, Johnson DW, Bradley JD, DeNittis A, et al., “Pretreatment FDG-PET metrics in stage III non–small cell lung cancer: ACRIN 6668/RTOG 0235,” J. Natl. Cancer Inst 107(4) (2015). - PMC - PubMed
1. Mena E, Sheikhbahaei S, Taghipour M, Jha AK, Vicente E, Xiao J, and Subramaniam RM, “18F-FDG PET/CT Metabolic tumor volume and intra-tumoral heterogeneity in pancreatic adenocarcinomas: Impact of dual-time-point and segmentation methods,” Clin. Nucl. Med 42(1), e16 (2017). - PMC - PubMed
1. Lee JW and Lee SM, “Radiomics in oncological PET/CT: clinical applications,” Nucl. Med. Mol. Imaging 52(3), 170–189 (2018). - PMC - PubMed
1. Zhao X, Li L, Lu W, and Tan S, “Tumor co-segmentation in PET/CT using multi-modality fully convolutional neural network,” Phys. Med. Biol 64(1), 015011 (2018). - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET

Affiliations

Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET

Authors

Affiliations

Abstract

Figures

Update of

References

Grants and funding

LinkOut - more resources

Full Text Sources