Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 17;27(1):206.
doi: 10.1186/s13058-025-02149-9.

Automated quantification of Ki-67 expression in breast cancer from H&E-stained slides using a transformer-based regression model

Affiliations

Automated quantification of Ki-67 expression in breast cancer from H&E-stained slides using a transformer-based regression model

Abadh K Chaurasia et al. Breast Cancer Res. .

Abstract

Background: Accurate quantification of the Ki-67 proliferation index is essential for breast cancer prognosis and treatment planning. Current automated methods, including classical and deep learning approaches based on cell detection or segmentation, often face challenges due to densely packed nuclei, morphological variability, and inter-laboratory differences. Since Hematoxylin and Eosin (H&E) staining is routinely performed, accurately estimating Ki-67 from these slides could save resources by eliminating the need for additional immunohistochemical (IHC) staining. We developed and validated a transformer-based regression model to estimate Ki-67 expression directly from H&E-stained Whole Slide Images (WSIs).

Methods: We used seven public datasets to select optimal transformer-based architectures and hyperparameters. WSIs underwent preprocessing to filter poor-quality patches, with a classification model identifying gradable patches. Only gradable patches proceeded to Ki-67 quantification. Initially, a regression model was trained on IHC-stained patches using independently annotated datasets, bypassing segmentation methods. This model generated pseudo-labels for unlabeled IHC patches, which were then paired with corresponding H&E images, with a separate model trained using only these H&E patches. Both models were evaluated separately across 1153 H&E and 843 IHC-stained WSIs, employing metrics such as R2.

Results: Our regression model had good predictive accuracy, with R2 values exceeding 0.90 for quantifying positive cells, negative cells, and Ki-67 ratios. The classification model effectively distinguished gradable patches, achieving a near-perfect AUROC (~ 100%) across independent and unseen datasets. Cross-modality performance was robust, achieving R2 values over 0.95 for positive and negative cell counts. Additionally, the model accurately captured the proliferation patterns from H&E-stained WSIs.

Conclusion: Our approach precisely quantifies Ki-67 expression and automates hotspot detection from WSIs, providing a scalable tool for digital pathology workflows. The cross-modality model potentially quantifies molecular expression from morphological features using H&E-stained WSIs.

Keywords: Breast Cancer; Digital Pathology; Hematoxylin and Eosin; Immunohistochemistry; Ki-67 Index; Regression Analysis; Vision Transformer.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: We incorporated seven publicly accessible histopathology datasets without direct patient interaction or personally identifiable information; no additional ethical approval was required. It complies with ethical guidelines, and all dataset sources follow data-sharing policies. Consent for publication: Not applicable. Competing interests: A.K.C., M.T.B., P.W.T., and A.W.H. are co-founders of Pandani Solutions Pty Ltd, Australia, specialising in computational pathology.

Figures

Fig. 1
Fig. 1
Schematic overview of the study
Fig. 2
Fig. 2
Patch-level agreement and morphologic correlation for Ki-67 quantification using IHC regression model. A Bland–Altman plots show the agreement between actual and predicted cell count values with the Ki-67 index in the testing set, highlighting mean differences and 95% limits of agreement. The left subplot displays positive counts, the middle subplot indicates negative counts, and the right subplot exhibits the Ki-67 index. B The IHC4BC dataset indicates a right-tailed distribution of the predicted Ki-67 index, with moderate to strong positive correlations (correlation coefficient = r) to DAB intensity and total nuclei count. C The cross-modality regression model quantified the Ki-67 counts on the testing set (4132), visualising the relationship between actual and estimated cell counts for positive, negative, and Ki-67 index
Fig. 3
Fig. 3
Our model predicted Ki-67 cell counts against the actual label from the testing set (BCData). The heatmap overlay displays the model-predicted distribution of Ki-67-positive cell counts, with warmer colours indicating a higher predicted Ki-67 index, aligning with ground truth
Fig. 4
Fig. 4
The classification model was evaluated on randomly selected patches from the testing set (ACROBAT, BCData testing set, and SICAPv2), which included gradable H&E, gradable IHC, and ungradable patches. The model’s class predictions with confidence levels were compared to the ground truth labels. Attention maps identify critical areas that influence the model’s decisions, highlighted by brighter regions
Fig. 5
Fig. 5
Visualisation of Ki-67 expression from paired IHC and H&E-stained patches is illustrated as follows: (IHC → H&E) | (IHC → H&E). The IHC patches indicate Ki-67 positive and negative cell counts inferred from the IHC regression model, while the corresponding H&E patches display Ki-67 cell counts quantified by the cross-modality model for the same tissue region across different staining modalities
Fig. 6
Fig. 6
Quantification of Ki-67 index at slide-level and visualisation of hotspot using H&E-stained WSIs from the ACROBAT dataset. For each slide, the left side displays the original thumbnail of WSIs, the middle panel overlays predicted Ki-67 hotspots in red, with the largest hotspot region outlined (white), high and low Ki-67 index patches, and the right panel presents bar plots of predicted Ki-67-positive (red) and Ki-67-negative (green) cell counts

References

    1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians [Internet]. 2024 May 1 [cited 2025 Feb 4];74(3):229–63. Available from: https://onlinelibrary.wiley.com/doi/abs/10.3322/caac.21834 - PubMed
    1. Arnold M, Morgan E, Rumgay H, Mafra A, Singh D, Laversanne M, et al. Current and future burden of breast cancer: Global statistics for 2020 and 2040. The Breast : Official Journal of the European Society of Mastology [Internet]. 2022 Sep 2 [cited 2024 Nov 22];66:15. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC9465273/ - PMC - PubMed
    1. Wojtyla C, Bertuccio P, Wojtyla A, La Vecchia C. European trends in breast cancer mortality, 1980–2017 and predictions to 2025. European journal of cancer (Oxford, England : 1990) [Internet]. 2021 Jul [cited 2025 Feb 4];152. Available from: https://pubmed.ncbi.nlm.nih.gov/34062485/ - PubMed
    1. Rudolph A, Chang-Claude J, Schmidt MK. Gene–environment interaction and risk of breast cancer. British Journal of Cancer [Internet]. 2016 Jan 12 [cited 2025 Apr 2];114(2):125–33. Available from: https://www.nature.com/articles/bjc2015439 - PMC - PubMed
    1. Obeagu EI, Obeagu GU. Breast cancer: A review of risk factors and diagnosis. Medicine [Internet]. 2024 Jan 19 [cited 2025 Apr 2];103(3). Available from: https://pubmed.ncbi.nlm.nih.gov/38241592/ - PMC - PubMed

LinkOut - more resources