Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 6;12(1):54.
doi: 10.1186/s40658-025-00767-y.

Interobserver ground-truth variability limits performance of automated glioblastoma segmentation on [18F]FET PET

Affiliations

Interobserver ground-truth variability limits performance of automated glioblastoma segmentation on [18F]FET PET

Selene De Sutter et al. EJNMMI Phys. .

Abstract

Background: Positron emission tomography (PET) with a [18F]fluoroethyl)-L-tyrosine ([18F]FET) tracer is of growing importance in the management of glioblastoma for the estimation of tumor extent and extraction of diagnostic and prognostic parameters. Robust and accurate glioblastoma segmentation methods are essential to maximize the benefits of this imaging modality. Given the importance of setting the foreground threshold during manual tumor delineation, this study investigates the added value of incorporating such prior knowledge to guide the automated segmentation and improve performance. Two segmentation networks were trained based on the nnU-Net guidelines: one with the [18F]FET PET image as sole input, and one with an additional input channel for the threshold map. For the latter, we investigate the benefit of manually obtained thresholds and explore automated prediction and generation of such maps. A fully automated pipeline was constructed by selecting the best performing threshold prediction approach and cascading this with the tumor segmentation model.

Results: The proposed two-channel network shows increased performance with guidance of threshold maps originating from the same reader whose ground-truth tumor label the prediction is compared to (DSC = 0.901). When threshold maps were generated by a different reader, performance reverted to levels comparable to the one-channel network and inter-reader variability. The proposed full pipeline achieves results on par with current state of the art (DSC = 0.807).

Conclusions: Incorporating a threshold map can significantly improve tumor segmentation performance when it aligns well with the ground-truth label. However, the current inability to reliably reproduce these maps-both manually and automatically-or the ground-truth tumor labels, restricts the achievable accuracy for automated glioblastoma segmentation on [18F]FET PET, highlighting the need for more consistent definitions of such ground-truth delineations.

Keywords: Brain; Deep learning; Glioblastoma; Positron emission tomography; Segmentation.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This single-center retrospective study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of Universitair Ziekenhuis Brussel (Commissie Medische Ethiek; protocol code EC-2021–137; date of approval 28–07-2021). This study is a retrospective analysis of data obtained during a prospective study (Axig (NCT01562197), GliAvAx (NCT03291314), and GlitIpNi (NCT03233152)), during which all patients signed informed consent for the use of their data. Consent for publication: The authors affirm that human research participants provided informed consent for publication of the images in Figs. 2 and 6. Competing interests: The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of data partitioning and annotation strategy, including fivefold cross-validation for training, and a fully independent test set for validation of the final model
Fig. 2
Fig. 2
Overview of the proposed approach. The full pipeline consists of initial prediction of a threshold map from the [18F]FET PET image using an automated threshold estimation network. An overview of the investigated threshold prediction networks is shown (green) in correspondence the manual segmentation workflow (orange): from the PET image, U-NetBKG predicts the background VOI, DenseNetTH predicts the threshold value, and U-NetTM predicts the threshold map. The image and threshold map are subsequently fed as input channels to the segmentation network, a two-channel U-Net, for the prediction of the tumor label. A multi-slice representation of the background VOI is shown below. VOI = Volume Of Interest
Fig. 3
Fig. 3
Bland–Altman plots illustrating the differences between ground-truth thresholds and thresholds predicted using the various approaches for automated threshold prediction. Each point corresponds to a pair of predicted and ground-truth threshold values. Inter-reader differences are shown in (d) for different pairs of readers (1–4), where each point corresponds to a pair of threshold values, both determined by a different reader. The plots display the mean difference (bias) and 95% limits of agreement. Red crosses in (a) indicate cases where the network failed to segment a background VOI, resulting in a threshold set to 0. SD = Standard Deviation
Fig. 4
Fig. 4
Bland–Altman plots illustrating the differences between ground-truth MTV and MTV predicted using 2 C-U-Net with threshold maps generated by the various automated threshold prediction approaches. Each point corresponds to a pair of predicted and ground-truth volumes. Inter-reader differences are shown in (d) for the different pairs of readers (1–4), where each point corresponds to a pair of tumor volumes, both determined by a different reader. The plots display the mean difference (bias) and 95% limits of agreement. GT = Ground Truth; MTV = Metabolic Tumor Volume; SD = Standard Deviation
Fig. 5
Fig. 5
Performance of the full pipeline as a function of lesion volume (ac) and scanner (df). DSC = Dice Similarity Coefficient; MTV = Metabolic Tumor Volume; NSD = Normalized Surface Dice; AVE = Absolute Volume Error
Fig. 6
Fig. 6
Example segmentations of representative subject. Threshold map from reader A, from reader B and automatically generated from U-NetTM are shown in the first column. Tumor label predictions from 2 C-U-Net using these threshold maps are shown in the second column and compared to ground-truth labels of both readers. Overlap between labels of both readers are visualized with corresponding metrics. AVE = Absolute Volume Error; DSC = Dice Similarity Coefficient; NSD = Normalized Surface Dice

Similar articles

References

    1. Galldiks N, Niyazi M, Grosu AL, et al. Contribution of PET imaging to radiotherapy planning and monitoring in glioma patients—a report of the PET/RANO group. Neuro Oncol. 2021;23(6):881–93. 10.1093/neuonc/noab013. - PMC - PubMed
    1. Albert NL, Weller M, Suchorska B, et al. Response assessment in neuro-oncology working group and European association for neuro-oncology recommendations for the clinical use of PET imaging in gliomas. Neuro Oncol. 2016;18(9):1199–208. 10.1093/neuonc/now058. - PMC - PubMed
    1. Pauleit D, Stoffels G, Bachofner A, et al. Comparison of 18F-FET and 18F-FDG PET in brain tumors. Nucl Med Biol. 2009;36(7):779–87. 10.1016/j.nucmedbio.2009.05.005. - PubMed
    1. Pauleit D, Floeth F, Hamacher K, et al. O-(2-[18F] fluoroethyl)-L-tyrosine PET combined with MRI improves the diagnostic assessment of cerebral gliomas. Brain. 2005;128(3):678–87. 10.1093/brain/awh399. - PubMed
    1. Pöpperl G, Götz C, Rachinger W, Gildehaus F-J, Tonn J-C, Tatsch K. Value of O-(2-[18F] fluoroethyl)-L-tyrosine PET for the diagnosis of recurrent glioma. Eur J Nucl Med Mol Imaging. 2004;31:1464–70. 10.1007/s00259-004-1590-1. - PubMed

LinkOut - more resources