Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May;46(5):2145-2156.
doi: 10.1002/mp.13455. Epub 2019 Mar 12.

Radiomics robustness assessment and classification evaluation: A two-stage method demonstrated on multivendor FFDM

Affiliations

Radiomics robustness assessment and classification evaluation: A two-stage method demonstrated on multivendor FFDM

Kayla Robinson et al. Med Phys. 2019 May.

Abstract

Purpose: Radiomic texture analysis is typically performed on images acquired under specific, homogeneous imaging conditions. These controlled conditions may not be representative of the range of imaging conditions implemented clinically. We aim to develop a two-stage method of radiomic texture analysis that incorporates the reproducibility of individual texture features across imaging conditions to guide the development of texture signatures which are robust across mammography unit vendors.

Methods: Full-field digital mammograms were retrospectively collected for women who underwent screening mammography on both a Hologic Lorad Selenia and GE Senographe 2000D system. Radiomic features were calculated on manually placed regions of interest in each image. In stage one (robustness assessment), we identified a set of nonredundant features that were reproducible across the two different vendors. This was achieved through hierarchical clustering and application of robustness metrics. In stage two (classification evaluation), we performed stepwise feature selection and leave-one-out quadratic discriminant analysis (QDA) to construct radiomic signatures. We refer to this two-state method as robustness assessment, classification evaluation (RACE). These radiomic signatures were used to classify the risk of breast cancer through receiver operator characteristic (ROC) analysis, using the area under the ROC curve as a figure of merit in the task of distinguishing between women with and without high-risk factors present. Generalizability was investigated by comparing the classification performance of a feature set on the images from which they were selected (intravendor) to the classification performance on images from the vendor on which it was not selected (intervendor). Intervendor and intravendor performances were also compared to the performance obtained by implementing ComBat, a feature-level harmonization method and to the performance by implementing ComBat followed by RACE.

Results: Generalizability, defined as the difference between intervendor and intravendor classification performance, was shown to monotonically decrease as the number of clusters used in stage one increased (Mann-Kendall P < 0.001). Intravendor performance was not shown to be statistically different from ComBat harmonization while intervendor performance was significantly higher than ComBat. No significant difference was observed between either of the single methods and the use of ComBat followed by RACE.

Conclusions: A two-stage method for robust radiomic signature construction is proposed and demonstrated in the task of breast cancer risk assessment. The proposed method was used to assess generalizability of radiomic texture signatures at varying levels of feature robustness criteria. The results suggest that generalizability of feature sets monotonically decreases as reproducibility of features decreases. This trend suggests that considerations of feature robustness in feature selection methodology could improve classifier generalizability in multifarious full-field digital mammography datasets collected on various vendor units. Additionally, harmonization methods such as ComBat may hold utility in classification schemes and should continue to be investigated.

Keywords: breast cancer; radiomics; robustness.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram demonstrating the interval of time between the date of the GE exam and the Hologic exam, for each patient included in the study. The time between exams were not found to be significantly different between women with and without high‐risk factors present (P = 0.29). [Color figure can be viewed at wileyonlinelibrary.com]
Figure 2
Figure 2
Diagrammatic illustration of steps involved in the robustness assessment, classification evaluation method. Texture features are first clustered and assessed in terms of robustness using only feature values and vendor information, remaining blinded to risk classification. The union of features identified by clustering features from M1 (machine one) and M2 (machine two) is the set considered to be robust and nonredundant. The most robust and nonredundant features are identified, and only these features are used as feature candidates in classification evaluation. Solid and dashed arrows show two different data pathways followed to evaluate the generalization of classification of the heterogeneous image datasets. The full analysis was repeated twice; once with the GE unit as M1 and the Hologic unit as M2, and then again but with the GE unit as M2 and the Hologic unit as M1.
Figure 3
Figure 3
Resulting performance of classifiers trained on varying quantities of clusters and therefore varying degrees of stringency on the robustness of input features. Parts (a) and (b) show performance of intra‐ and intervendor feature selection and classifier construction as the number of clusters, and therefore stringency on robustness, is varied. Parts (c) and (d) show the difference between intra‐ and intervendor classifier performance to demonstrate generalizability. Parts (a) and (c) show results for when GE is designated M1 and Hologic is designated M2. Parts (b) and (d) show results for when Hologic is designated M1 and GE is designated M2. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 4
Figure 4
Results of the Mann–Kendall test for the presence of monotonic trends, and the Thiel–Sen Estimator of such trends for the performance as a function of the number of clusters. Statistically significant values are denoted by boldface font. Colored results (blue, red) correspond to intravendor comparisons using GE and Hologic images, respectively. Gray results correspond to intervendor comparisons. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 5
Figure 5
Summary of features selected for the classifier when robustness assessment, classification evaluation is performed either with GE designated as M1 or Hologic designated as M1. The results presented in this figure are specifically from selection after grouping features into 46 clusters. This number of clusters was chosen as it provides the best intervendor performance for each manufacturer. Selected features were recorded from each leave‐one‐out iteration during stepwise feature selection, and the 18 features most frequently selected for each manufacturer is recorded here. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 6
Figure 6
Performance in the task of classifying the presence of risk factors of breast cancer of three analysis methods: (a) robustness assessment, classification evaluation, (b) ComBat, and (c) ComBat followed by robustness assessment, classification evaluation. In each method, 18 features were included in the ultimate radiomic signature construction, and leave‐one‐out cross‐validation was performed. While intravendor comparisons were not significantly different between the three methods, intervendor comparisons were significantly different, with the two‐stage method performing better as judged by the area under the curve (AUC). Recall that M1 refers to the vendor on whose images features were selected as machine one, and M2 refers to the vendor used to assess generalizability. By the Holm–Bonferroni correction for multiple comparisons, P < 0.017 is required to demonstrate statistical significance. [Color figure can be viewed at wileyonlinelibrary.com]
Figure 7
Figure 7
Summary of trends in robustness metrics computed on features before and after ComBat harmonization. mean of feature ratio (MFR) near zero indicates high robustness, and correlation near 1 indicates high robustness. [Color figure can be viewed at wileyonlinelibrary.com]

Similar articles

Cited by

References

    1. National Center for Health Statistics (US) . Health, United States, 2016: With Chartbook on Long‐term Trends in Health [Internet]. Hyattsville (MD): National Center for Health Statistics (US); 2017 [cited 2018 May 29]. Available from: http://www.ncbi.nlm.nih.gov/books/NBK453378/ - PubMed
    1. Tabár L, Vitak B, Chen TH‐H, et al. Swedish two‐county trial: impact of mammographic screening on breast cancer mortality during 3 decades. Radiology. 2011;260:658–663. - PubMed
    1. Saftlas AF, Hoover RN, Brinton LA, et al. Mammographic densities and risk of breast cancer. Cancer. 1991;67:2833–2838. - PubMed
    1. Boyd NF, Byng JW, Jong RA, et al. Quantitative classification of mammographic densities and breast cancer risk: results from the Canadian National Breast Screening Study. J Natl Cancer Inst. 1995;87:670–675. - PubMed
    1. McCormack VA, dos Santos Silva I. Breast density and parenchymal patterns as markers of breast cancer risk: a meta‐analysis. Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol. 2006;15:1159–1169. - PubMed

Publication types