Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr;307(1):e220715.
doi: 10.1148/radiol.220715. Epub 2022 Dec 20.

Inconsistent Partitioning and Unproductive Feature Associations Yield Idealized Radiomic Models

Affiliations

Inconsistent Partitioning and Unproductive Feature Associations Yield Idealized Radiomic Models

Mishka Gidwani et al. Radiology. 2023 Apr.

Abstract

Background Radiomics is the extraction of predefined mathematic features from medical images for the prediction of variables of clinical interest. While some studies report superlative accuracy of radiomic machine learning (ML) models, the published methodology is often incomplete, and the results are rarely validated in external testing data sets. Purpose To characterize the type, prevalence, and statistical impact of methodologic errors present in radiomic ML studies. Materials and Methods Radiomic ML publications were reviewed for the presence of performance-inflating methodologic flaws. Common flaws were subsequently reproduced with randomly generated features interpolated from publicly available radiomic data sets to demonstrate the precarious nature of reported findings. Results In an assessment of radiomic ML publications, the authors uncovered two general categories of data analysis errors: inconsistent partitioning and unproductive feature associations. In simulations, the authors demonstrated that inconsistent partitioning augments radiomic ML accuracy by 1.4 times from unbiased performance and that correcting for flawed methodologic results in areas under the receiver operating characteristic curve approaching a value of 0.5 (random chance). With use of randomly generated features, the authors illustrated that unproductive associations between radiomic features and gene sets can imply false causality for biologic phenomenon. Conclusion Radiomic machine learning studies may contain methodologic flaws that undermine their validity. This study provides a review template to avoid such flaws. © RSNA, 2022 Supplemental material is available for this article. See also the editorial by Jacobs in this issue.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: M.G. No relevant relationships. K.C. No relevant relationships. J.B.P. No relevant relationships. K.V.H. No relevant relationships. S.R.A. No relevant relationships. P.S. No relevant relationships. C.D.F. Travel reimbursement and speaking honoraria from Elekta, National Institutes of Health, American Association of Physicists in Medicine, European Society for Therapeutic Radiation Oncology, American Society of Clinical Oncology, Varian Medical System, and Philips; unpaid service for advisory committee for the Dartmouth-Hitchcock Cancer Center Department of Radiation Oncology; serves in a committee or leadership service capacity for the National Institutes of Health, American Society of Clinical Oncology, Radiological Society of North America, American Association of Physicists in Medicine, NRG Oncology, American Cancer Society, Dutch Cancer Society, and Rice University; receives in-kind support from Elekta. J.K.C. Grants or contracts from the NIH; member of the Radiology-AI editorial board.

Figures

None
Graphical abstract
Diagrams of inconsistent partitioning. Random features (R) based on
published radiomics data form the basis of our experimentation (atypical
from radiomics machine learning [ML] studies). (A) The upper level (blue and
yellow) illustrates consistent partitioning that prevents information leak,
while the lower level (green) demonstrates how the use of the entire data
set for radiomics feature normalization, feature selection, hyperparameter
selection, model selection, and performance reporting will result in an
unrealistically optimistic assessment of the radiomics ML model. (B)
Diagrams show normalization strategies. Data set normalization (green) is an
example of inconsistent partitioning, with use of a mean and SD calculated
with use of all samples, both the training and test sets, to scale. Train
normalization (right) and split normalization (bottom) are different
approaches to consistent partitioning (more details in Appendix
S1).
Figure 1:
Diagrams of inconsistent partitioning. Random features (R) based on published radiomics data form the basis of our experimentation (atypical from radiomics machine learning [ML] studies). (A) The upper level (blue and yellow) illustrates consistent partitioning that prevents information leak, while the lower level (green) demonstrates how the use of the entire data set for radiomics feature normalization, feature selection, hyperparameter selection, model selection, and performance reporting will result in an unrealistically optimistic assessment of the radiomics ML model. (B) Diagrams show normalization strategies. Data set normalization (green) is an example of inconsistent partitioning, with use of a mean and SD calculated with use of all samples, both the training and test sets, to scale. Train normalization (right) and split normalization (bottom) are different approaches to consistent partitioning (more details in Appendix S1).
Receiver operating characteristic curves illustrate the performance
inflation gained from each subsequent radiomics machine learning
methodologic mistake as demonstrated on random radiomics features. Without
mistakes, the area under the receiver operating characteristic curve (AUC)
value (ROC-AUC) approximates 0.5 or random chance and compounding sufficient
mistakes lead to idealized performance of a 1.0 AUC value.
Figure 2:
Receiver operating characteristic curves illustrate the performance inflation gained from each subsequent radiomics machine learning methodologic mistake as demonstrated on random radiomics features. Without mistakes, the area under the receiver operating characteristic curve (AUC) value (ROC-AUC) approximates 0.5 or random chance and compounding sufficient mistakes lead to idealized performance of a 1.0 AUC value.
(A) Strip chart shows mean accuracy loss from changing inconsistent
partitioning (data set normalization and feature selection) to consistent
partitioning (train normalization and feature selection) in 100 replicates.
(B) Lollipop plot shows loss of mean model efficiency (LassoCV R2) over 100
iterations after changing from inconsistent to consistent partitioning. (C)
Line chart shows effect of sample size on model performance, keeping number
of radiomics features (10 features) and method of feature selection
constant. Wide CIs are seen at low sample sizes because choice of data
partition drastically alters the distribution of features in each partition.
Performance plateaus at the area under the receiver operating characteristic
curve (ROC AUC) value of 0.5 because the features and label are randomly
generated. CV = cross validation, HNSCC = head and neck squamous cell
carcinoma, LGG = low-grade glioma, SE = standard error.
Figure 3:
(A) Strip chart shows mean accuracy loss from changing inconsistent partitioning (data set normalization and feature selection) to consistent partitioning (train normalization and feature selection) in 100 replicates. (B) Lollipop plot shows loss of mean model efficiency (LassoCV R2) over 100 iterations after changing from inconsistent to consistent partitioning. (C) Line chart shows effect of sample size on model performance, keeping number of radiomics features (10 features) and method of feature selection constant. Wide CIs are seen at low sample sizes because choice of data partition drastically alters the distribution of features in each partition. Performance plateaus at the area under the receiver operating characteristic curve (ROC AUC) value of 0.5 because the features and label are randomly generated. CV = cross validation, HNSCC = head and neck squamous cell carcinoma, LGG = low-grade glioma, SE = standard error.
Case-based consensus clustering of random radiomics features
associated with overall survival (OS) in The Cancer Genome Atlas Low-Grade
Glioma (left) and head and neck squamous cell carcinoma (HNSCC) (right) data
sets. Despite sharp feature distribution differences, as seen in the heat
maps, no statistically significant difference in outcome distribution exists
between the assigned clusters. LGG = low-grade glioma.
Figure 4:
Case-based consensus clustering of random radiomics features associated with overall survival (OS) in The Cancer Genome Atlas Low-Grade Glioma (left) and head and neck squamous cell carcinoma (HNSCC) (right) data sets. Despite sharp feature distribution differences, as seen in the heat maps, no statistically significant difference in outcome distribution exists between the assigned clusters. LGG = low-grade glioma.
Combination of radiomics and biologic variables. (A) Receiver
operating characteristic curves show support vector machine models fit to
combinations of radiomics and biologic variables. (B) Dot plot with error
bars show concordance index for radiomics score (RadScore) in Cox
proportional hazards models. A concordance index of 0.5 represents random
chance. The random radiomics features have higher concordance with true
outcome (overall survival) than the authentic features. (C) Bar chart shows
significant associations (Pearson) between random radiomics features and
authentic gene ontology pathways in The Cancer Genome Atlas Low-Grade Glioma
data set. (D) Kaplan-Meier curves show overall survival split by median
feature value of a random feature observed to be spuriously yet
significantly correlated with glycosphingolipid biosynthesis gene ontology
pathway. Fts = features, HNSCC = head and neck squamous cell carcinoma, LGG
= low-grade glioma.
Figure 5:
Combination of radiomics and biologic variables. (A) Receiver operating characteristic curves show support vector machine models fit to combinations of radiomics and biologic variables. (B) Dot plot with error bars show concordance index for radiomics score (RadScore) in Cox proportional hazards models. A concordance index of 0.5 represents random chance. The random radiomics features have higher concordance with true outcome (overall survival) than the authentic features. (C) Bar chart shows significant associations (Pearson) between random radiomics features and authentic gene ontology pathways in The Cancer Genome Atlas Low-Grade Glioma data set. (D) Kaplan-Meier curves show overall survival split by median feature value of a random feature observed to be spuriously yet significantly correlated with glycosphingolipid biosynthesis gene ontology pathway. Fts = features, HNSCC = head and neck squamous cell carcinoma, LGG = low-grade glioma.
Flow diagram shows reviewer questions when auditing radiomics machine
learning studies for problem areas highlighted in this study: inconsistent
partitioning and unproductive feature associations.
Figure 6:
Flow diagram shows reviewer questions when auditing radiomics machine learning studies for problem areas highlighted in this study: inconsistent partitioning and unproductive feature associations.

Comment in

References

    1. Gillies RJ , Kinahan PE , Hricak H . Radiomics: Images Are More than Pictures, They Are Data . Radiology 2016. ; 278 ( 2 ): 563 – 577 . - PMC - PubMed
    1. Huang YQ , Liang CH , He L , et al. . Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer . J Clin Oncol 2016. ; 34 ( 18 ): 2157 – 216 4 [Published correction appears in J Clin Oncol 2016;34(20):2436.]. - PubMed
    1. Huang Y , Liu Z , He L , et al. . Radiomics Signature: A Potential Biomarker for the Prediction of Disease-Free Survival in Early-Stage (I or II) Non-Small Cell Lung Cancer . Radiology 2016. ; 281 ( 3 ): 947 – 957 . - PubMed
    1. Li H , Zhu Y , Burnside ES , et al. . Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set . NPJ Breast Cancer 2016. ; 2 ( 1 ): 16012 . - PMC - PubMed
    1. Kaufman S , Rosset S , Perlich C , Stitelman O . Leakage in Data Mining: Formulation, Detection, and Avoidance . ACM Trans Knowl Discov Data 2012. ; 6 ( 4 ): 1 – 21 .

LinkOut - more resources