Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 16:26:100450.
doi: 10.1016/j.phro.2023.100450. eCollection 2023 Apr.

Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization

Affiliations

Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization

Amal Joseph Varghese et al. Phys Imaging Radiat Oncol. .

Abstract

Background and purpose: Radiomics models trained with limited single institution data are often not reproducible and generalisable. We developed radiomics models that predict loco-regional recurrence within two years of radiotherapy with private and public datasets and their combinations, to simulate small and multi-institutional studies and study the responsiveness of the models to feature selection, machine learning algorithms, centre-effect harmonization and increased dataset sizes.

Materials and methods: 562 patients histologically confirmed and treated for locally advanced head-and-neck cancer (LA-HNC) from two public and two private datasets; one private dataset exclusively reserved for validation. Clinical contours of primary tumours were not recontoured and were used for Pyradiomics based feature extraction. ComBat harmonization was applied, and LASSO-Logistic Regression (LR) and Support Vector Machine (SVM) models were built. 95% confidence interval (CI) of 1000 bootstrapped area-under-the-Receiver-operating-curves (AUC) provided predictive performance. Responsiveness of the models' performance to the choice of feature selection methods, ComBat harmonization, machine learning classifier, single and pooled data was evaluated.

Results: LASSO and SelectKBest selected 14 and 16 features, respectively; three were overlapping. Without ComBat, the LR and SVM models for three institutional data showed AUCs (CI) of 0.513 (0.481-0.559) and 0.632 (0.586-0.665), respectively. Performances following ComBat revealed AUCs of 0.559 (0.536-0.590) and 0.662 (0.606-0.690), respectively. Compared to single cohort AUCs (0.562-0.629), SVM models from pooled data performed significantly better at AUC = 0.680.

Conclusions: Multi-institutional retrospective data accentuates the existing variabilities that affect radiomics. Carefully designed prospective, multi-institutional studies and data sharing are necessary for clinically relevant head-and-neck cancer prognostication models.

Keywords: Head-and-neck cancer; Loco-regional recurrence; Machine learning; Multi-institutional; Prognosis; Radiomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Features selected by LASSO for the single and multi-institutional pooled datasets.
Fig. 2
Fig. 2
Features selected by SelectKBest for the single and multi-institutional pooled datasets.
Fig. 3
Fig. 3
The Kernel Density Estimate (KDE) (A and C) and Box (B and D) plots for one representative feature (GLCM Maximum Probability) before (A and B) and after (C and D) ComBat harmonization.
Fig. 4
Fig. 4
Performance of the LR and SVM models trained on pooled datasets prior to and post ComBat harmonization. Model performance is reported on validation data HN3-MAASTRO.
Fig. 5
Fig. 5
Performance of the models trained with data from single institution versus multi-institutional pooled data. Validation ROC of Logistic Regression (A) and SVM LRR models (B) for an example single dataset (HN-CMC) and its pooled dataset combinations. The ROCs correspond to the HN-CMC (red), HN-CMC + HN1-MAASTRO (blue), HN-CMC + HN-MONTREAL (green) and HN-CMC + HN1-MAASTRO + HN-MONTREAL datasets (orange), respectively. C) Test AUC across all single and pooled datasets in this experiment. Validation data was HN3-MAASTRO. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Similar articles

Cited by

References

    1. Chang J.H., Wu Y., Wu A.T.H. Locoregionally recurrent head and neck squamous cell carcinoma: incidence, survival, prognostic factors, and treatment outcomes. Oncotarget. 2017;8 doi: 10.18632/oncotarget.16340. 55600–12. - DOI - PMC - PubMed
    1. Alsahafi E., Begg K., Amelio I., Raulf N., Lucarelli P., Sauter T., et al. Clinical update on head and neck cancer: molecular biology and ongoing challenges. Cell Death Dis. 2019;10:1–17. doi: 10.1038/s41419-019-1769-9. - DOI - PMC - PubMed
    1. Massa S.T., Osazuwa-Peters N., Christopher K.M., Arnold L.D., Schootman M., Walker R.J., et al. Competing causes of death in the head and neck cancer population. Oral Oncol. 2017;65:8–15. doi: 10.1016/j.oraloncology.2016.12.006. - DOI - PubMed
    1. Elhalawani H., Mohamed A.S., Mulder S., Grossberg A., Smith K.E., Gunn G.B., et al. Radiomics prediction of radiation treatment outcomes in oropharyngeal cancer: a clinical and image repository in concert with the cancer imaging archive (TCIA) Int J Radiat Oncol Biol Phys. 2018;102:e215–e216. doi: 10.1016/j.ijrobp.2018.07.748. - DOI
    1. Kalendralis P., Shi Z., Traverso A., Choudhury A., Sloep M., Zhovannik I., et al. FAIR-compliant clinical, radiomics and DICOM metadata of RIDER, interobserver, Lung1 and head-Neck1 TCIA collections. Med Phys. 2020;47:5931–5940. doi: 10.1002/mp.14322. - DOI - PMC - PubMed

LinkOut - more resources