Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 4;25(1):312.
doi: 10.1186/s12880-025-01855-2.

Open-radiomics: a collection of standardized datasets and a technical protocol for reproducible radiomics machine learning pipelines

Affiliations

Open-radiomics: a collection of standardized datasets and a technical protocol for reproducible radiomics machine learning pipelines

Khashayar Namdar et al. BMC Med Imaging. .

Abstract

Background: As an important branch of machine learning pipelines in medical imaging, radiomics faces two major challenges namely reproducibility and accessibility. In this work, we introduce open-radiomics, a set of radiomics datasets along with a comprehensive radiomics pipeline based on our proposed technical protocol to investigate the effects of radiomics feature extraction on the reproducibility of the results.

Methods: We curated large-scale radiomics datasets based on three open-source datasets; BraTS 2020 for high-grade glioma (HGG) versus low-grade glioma (LGG) classification and survival analysis, BraTS 2023 for O6-methylguanine-DNA methyltransferase (MGMT) classification, and non-small cell lung cancer (NSCLC) survival analysis from the Cancer Imaging Archive (TCIA). We used the BraTS 2020 open-source Magnetic Resonance Imaging (MRI) dataset to demonstrate how our proposed technical protocol could be utilized in radiomics-based studies. The cohort includes 369 adult patients with brain tumors (76 LGG, and 293 HGG). Using PyRadiomics library for LGG vs. HGG classification, we created 288 radiomics datasets; the combinations of 4 MRI sequences, 3 binWidths, 6 image normalization methods, and 4 tumor subregions. We used Random Forest classifiers, and for each radiomics dataset, we repeated the training-validation-test (60%/20%/20%) experiment with different data splits and model random states 100 times (28,800 test results) and calculated the Area Under the Receiver Operating Characteristic Curve (AUROC).

Results: Unlike binWidth and image normalization, the tumor subregion and imaging sequence significantly affected performance of the models. T1 contrast-enhanced sequence and the union of Necrotic and the non-enhancing tumor core subregions resulted in the highest AUROCs (average test AUROC 0.951, 95% confidence interval of (0.949, 0.952)). Although several settings and data splits (28 out of 28800) yielded test AUROC of 1, they were irreproducible.

Conclusions: Our experiments demonstrate the sources of variability in radiomics pipelines (e.g., tumor subregion) can have a significant impact on the results, which may lead to superficial perfect performances that are irreproducible.

Clinical trial number: Not applicable.

Keywords: Brain cancer; Dataset; Open-source; Radiomics; Reproducibility.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
An example BraTS 2020 image (the FLAIR sequence) and its corresponding segmentation mask. The orange area is AT, the green area is ED, and the gray parts are NETnNCR [–9]
Fig. 2
Fig. 2
An example TCIA NSCLC CT image and its corresponding segmentation masks: left lung, right lung, spine, and GTV annotated with green, blue, yellow, and red, respectively [17]
Fig. 3
Fig. 3
The repetitive classification approach
Fig. 4
Fig. 4
Effect of the studied factors on AUROC performance of the classifiers: (a) binWidth (b) image normalization (c) VOI subregion (d) MRI sequence
Fig. 5
Fig. 5
Histograms of the top feature on the top dataset: The horizontal axis represents bins of the feature values, and the vertical axis shows the number of VOIs with values in each bin
Fig. 6
Fig. 6
The 10 top-performing datasets
Fig. 7
Fig. 7
Effect of the studied factors on AUROC performance of the multisequence classifiers: (a) binWidth (b) image normalization (c) VOI subregion
Fig. 8
Fig. 8
Axial non-normalized T1CE MRI images demonstrating examples of high-grade glioma (a) and low-grade glioma (b). The outlined regions represent segmented tumor subregions: necrotic and non-enhancing tumor core (NETnNCR, red contour), active (enhancing) tumor core (AT, cyan contour), and peritumoral edema (ED, green contour)
Fig. 9
Fig. 9
Intensity histograms comparing T1CE MRI signal distributions for tumor subregions between HGG (blue) and LGG (orange). Subregions analyzed include: (a) ED, (b) AT, and (c) NETnNCR. The distinct separations between HGG and LGG intensity distributions in the NETnNCR region (c) highlight its higher discriminatory potential
Fig. 10
Fig. 10
Effect of Test/Train Split Ratio on RF AUROC Performance. Mean AUROC and AUROC range (shaded area) are plotted across 36 test/train split ratios (from 0.9 to 0.1), each repeated 100 times using stratified random splits. As the training set increases, mean AUROC improves, while AUROC variability is minimized near balanced splits (0.6 − 0.4), indicating optimal stability
Fig. 11
Fig. 11
Comparison of Data Splitting Methods. Boxplots of AUROC scores from six evaluation strategies: repeated 4-Fold CV, one-time 4-Fold, 5-Fold, 10-Fold, repeated 75/25 split (our pipeline), and LOO. Our method closely matches LOO in mean performance, while repeated 4-Fold better captures variance at the cost of higher computation
None

References

    1. Liu X, et al. Application of radiomic MRI quantitative features in diagnosis of combined hepatocellular-cholangiocarcinoma, cholangiocarcinoma and hepatocellular carcinoma using machine learning. In: RSNA; 2019.
    1. Liu X, et al. Can machine learning radiomics provide pre-operative differentiation of combined hepatocellular cholangiocarcinoma from hepatocellular carcinoma and cholangiocarcinoma to inform optimal treatment planning? Eur Radiol. 2020. 10.1007/s00330-020-07119-7. - PubMed
    1. Wagner MW, Namdar K, Biswas A, Monah S, Khalvati F, Ertl-Wagner BB. Radiomics, machine learning, and artificial intelligence—what the neuroradiologist needs to know. Neuroradiology. 2021;63(12):1957–67. 10.1007/s00234-021-02813-9. - PMC - PubMed
    1. Yadav SP. The wholeness in suffix -omics, -omes, and the word om. J Biomol Tech. 2007;18(5):277. - PMC - PubMed
    1. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Grants and funding