Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 4;26(1):177.
doi: 10.1186/s13058-024-01915-5.

Image analysis-based identification of high risk ER-positive, HER2-negative breast cancers

Affiliations

Image analysis-based identification of high risk ER-positive, HER2-negative breast cancers

Dong Neuck Lee et al. Breast Cancer Res. .

Abstract

Background: Breast cancer subtypes Luminal A and Luminal B are classified by the expression of PAM50 genes and may benefit from different treatment strategies. Machine learning models based on H&E images may contain features associated with subtype, allowing early identification of tumors with higher risk of recurrence.

Methods: H&E images (n = 630 ER+/HER2-breast cancers) were pixel-level segmented into epithelium and stroma. Convolutional neural network and multiple instance learning were used to extract image features from original and segmented images. Patient-level classification models were trained to discriminate Luminal A versus B image features in tenfold cross-validation, with or without grade adjustment. The best-performing visual classifier was incorporated into envisioned diagnostic protocols as an alternative to genomic testing (PAM50). The protocols were then compared in time-to-recurrence models.

Results: Among ER+/HER2-tumors, the image-based protocol differentiated recurrence times with a hazard ratio (HR) of 2.81 (95% CI: 1.73-4.56), which was similar to the HR for PAM50 (2.66, 95% CI: 1.65-4.28). Grade adjustment did not improve subtype prediction accuracy, but did help balance sensitivity and specificity. Among high grade participants, sensitivity and specificity (0.734 and 0.474, respectively) became more similar (0.732 and 0.624, respectively) in grade-adjusted models. The original and epithelium-specific images had similar performance and highest accuracy, followed by stroma or binarized images showing only the epithelial-stromal interface.

Conclusions: Given low rates of genomic testing uptake nationally, image-based methods may help identify ER+/HER2-patients who could benefit from testing.

Keywords: Breast cancer; CBCS3; Distance weighted learning; Histology; Image segmentation; Multiple instance learning.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval: The study was approved by the University of North Carolina Institutional Review Board in accordance with U.S. Common Rule. All study participants provided written informed consent prior to study entry. This study complied with relevant ethical regulations, including the Declaration of Helsinki. Code availability: The pre-trained model and code used in this study are publicly available on GitHub ( https://github.com/eastk90/cbcs-lumAB/ ). Competing interests: The University of North Carolina, Chapel Hill has a license of intellectual property interest in GeneCentric Diagnostics and BioClassifier, LLC, which may be used in this study. The University of North Carolina, Chapel Hill may benefit from this interest that is/are related to this research. The terms of this arrangement have been reviewed and approved by the University of North Carolina, Chapel Hill Conflict of Interest Program in accordance with its conflict of interest policies.

Figures

Fig. 1
Fig. 1
Scenarios for Screening Low Stage ER+/HER2-cancers. A A conceptualized reference breast cancer diagnosis protocol containing genomic testing. B An alternative protocol which can be implemented by replacing genomic testing with a machine learning model. C A hybrid protocol which recommends genomic testing only for potential high risk breast cancer patients as predicted by the machine learning model. The numbers below boxes represent the corresponding patient count from the CBCS cohort
Fig. 2
Fig. 2
Pipeline for extracting a 15-dimensional feature vector from a core image. This figure illustrates the process for an epithelial image among four types of segmented core images shown in B. A Every core was stain normalized to reduce stain intensity variations by slides. B The color-normalized H&E core image was separated into two tissue types, epithelium and collagenous stroma, using pixel-level image segmentation. Additionally, we constructed binary images to investigate the regional shape of the epithelium and collagenous stroma. C We divided each core into k patches with a size of 200 × 200 pixels. D Non-informative patches with background pixels above a patch-specific threshold were excluded. E Patches with artifacts were excluded by a trained artifact detector. F Image features were extracted from each informative patch using the convolutional layers of the pre-trained VGG16 architecture. G A one-dimensional patch score was calculated by projecting the patch features in the estimated direction that discriminates between Luminal A and Luminal B subtypes. To construct the core-level image feature vector, we summarize the k patch scores into 15 equally spaced quantiles
Fig. 3
Fig. 3
Performance of subtype classifiers in the CBCS validation set. Average sensitivities, specificities, and AUC scores from tenfold cross-validation, along with their standard errors are provided for both low/intermediate and high grade. A Models trained on image features extracted from original core images for unadjusted, grade-adjusted, and stratified models. B Grade-adjusted models by image type (epithelium, stroma, or binary)
Fig. 4
Fig. 4
Representative images for special CBCS cores located at the 1st, 10th, 90th, and 99th percentiles of the Luminal A and Luminal B DWD distribution, along with their representative patches. Each panel includes two sets of representative images: those generated from the grade-adjusted model using original images (left) and those from the grade-adjusted model of binary images (right). 1st and 10th percentile Luminal A cores and patches (upper row) exhibit dense collagenous stroma with a wavy collagen fiber pattern in original images. Conversely, 90th and 99th Luminal B cores (lower row) predominantly display high cellularity invasive carcinoma surrounded by thin collagenous stroma bands. Binarized images highlight a more irregular interface between epithelial (red) and collagenous stroma (green) regions in Luminal B tumors
Fig. 5
Fig. 5
Survival analysis results for the three protocols in CBCS data. Each panel displays Kaplan-Meier curves for the low-risk and high-risk groups classified by the corresponding protocol: A Genomics-based risk groups, B Image-based risk groups, or C Hybrid Image and Genomics-based risk groups. Results of univariate and multivariate Cox models for each protocol, including hazard ratios and p-values for differences between the risk groups are provided below the Kaplan-Meier plots
Fig. 6
Fig. 6
Survival analysis results in TCGA data. Each panel displays Kaplan-Meier curves for the low-risk and high-risk groups classified by the corresponding protocol: A Genomics-based risk groups, B Image-based risk groups, or C Hybrid Image and Genomics-based risk groups. Results of univariate and multivariate Cox models for each protocol, including hazard ratios and p-values for differences between the risk groups are provided below the Kaplan-Meier plots

References

    1. Perou CM, Sørlie T, Eisen MB, Van De Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al. Molecular portraits of human breast tumours. Nature. 2000;406(6797):747–52. - PubMed
    1. Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo J, Marron JS, Nobel AB, Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–7. 10.1200/JCO.2008.18.1370. - PMC - PubMed
    1. Goldhirsch A, Wood WC, Coates AS, Gelber RD, Thürlimann B, Senn H-J. Strategies for subtypes—dealing with the diversity of breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol. 2011;22(8):1736–47. 10.1093/annonc/mdr304. - PMC - PubMed
    1. Tran B, Bedard PL. Luminal-B breast cancer and novel therapeutic targets. Breast Cancer Res. 2011;13(6):221. 10.1186/bcr2904. - PMC - PubMed
    1. Van Alsten SC, Dunn MR, Hamilton AM, Ivory JM, Gao X, Kirk EL, Nsonwu-Farley JS, Carey LA, Abdou Y, Reeder-Hayes KE, et al. Disparities in oncotypedx testing and subsequent chemotherapy receipt by geography and socioeconomic status. Cancer Epidemiol Biomark Prevent. 2024;33(5):654–61. - PMC - PubMed