Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 18:17:100435.
doi: 10.1016/j.jpi.2025.100435. eCollection 2025 Apr.

Fully automatic HER2 tissue segmentation for interpretable HER2 scoring

Affiliations

Fully automatic HER2 tissue segmentation for interpretable HER2 scoring

Mathias Öttl et al. J Pathol Inform. .

Abstract

Breast cancer is the most common cancer in women, with HER2 (human epidermal growth factor receptor 2) overexpression playing a critical role in regulating cell growth and division. HER2 status, assessed according to established scoring guidelines, offers important information for treatment selection. However, the complexity of the task leads to variability in human rater assessments. In this work, we propose a fully automated, interpretable HER2 scoring pipeline based on pixel-level semantic segmentations, designed to align with clinical guidelines. Using polygon annotations, our method balances annotation effort with the ability to capture fine-grained details and larger structures, such as non-invasive tumor tissue. To enhance HER2 segmentation, we propose the use of a Wasserstein Dice loss to model class relationships, ensuring robust segmentation and HER2 scoring performance. Additionally, based on observations of pathologists' behavior in clinical practice, we propose a calibration step to the scoring rules, which positively impacts the accuracy and consistency of automated HER2 scoring. Our approach achieves an F1 score of 0.832 on HER2 scoring, demonstrating its effectiveness. This work establishes a potent segmentation pipeline that can be further leveraged to analyze HER2 expression in breast cancer tissue.

Keywords: Deep learning; HER2; HER2 scoring; Histopathology; Semantic segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Figures

Fig. 1
Fig. 1
Overview of the annotation procedure for all data used in this work. In total, data from 626 patients is utilized in this work.
Fig. 2
Fig. 2
(a) The ASCO/CAP scoring guidelines, for determining patient-wise HER2 scores based on invasive tumor cells. (b) Example tissue images illustrating all four invasive HER2 expression levels (HER2 0: no staining or faint/incomplete staining, HER2 1+: incomplete, faint staining, HER2 2+: weak to moderate complete staining, HER2 3+: complete, intense circumferential staining) as well as non-invasive tumor tissue, which can express any level of membrane staining.
Fig. 3
Fig. 3
(a) The distribution of the four HER2 classes and non-invasive tumor tissue as annotated by the five raters, illustrating the variability in labeling across different annotators. (b) Visualization of the agreement with the consensus for all classes. (c) A confusion matrix displaying the mean F1 score (average of class-wise F1 scores), comparing each rater to every other rater and to the consensus, thereby highlighting the level of agreement and disagreement among raters.
Fig. 4
Fig. 4
Confusion matrices of the F1 scores between raters and their consensus for HER2 2+ tissue (a) and non-invasive tumor tissue (b).
Fig. 5
Fig. 5
Confusion matrices of our final segmentation method. Left shows the confusion between the prediction between the consensus ground truth. Middle shows the concordance confusion, where a prediction is considered correct if one annotator agreed with the label. Right shows the confusion matrix if our final method is trained without the SP annotated data.
Fig. 6
Fig. 6
Examples of discrepancies between the annotator consensus and the model prediction with regard to the segmentation of non-background tissues.
Fig. 7
Fig. 7
Visual examples of four cases where a disagreement between the annotators is visible, as well as the consensus and the network predictions.
Fig. 8
Fig. 8
Confusion matrices for our scoring pipeline, when directly following the ASCO/CAP guidelines on all the WSI (upper left) and when only considering HER2 positive cases (lower left) and HER2 negative cases (lower right). Result with the adjusted HER2 2+ thresholds are shown in the upper right.
Fig. 9
Fig. 9
Example illustrating how segmentation results can enhance the interpretability of HER2 scoring. The figure shows the segmentation of a single WSI, along with the overall HER2 distribution derived from it. Three distinct regions within the WSI are highlighted, each exhibiting a different HER2 expression pattern. The corresponding HER2 distributions for these regions are also displayed, demonstrating how local variations in staining intensity and completeness can be easily visualized and analyzed.

Similar articles

References

    1. Ferlay J., Colombet M., Soerjomataram I., et al. Cancer statistics for the year 2020: an overview. Int J Cancer. 2021;149(4):778–789. - PubMed
    1. Malhotra G.K., Zhao X., Band H., Band V. Histological, molecular and functional subtypes of breast cancers. Cancer Biol Ther. 2010;10(10):955–960. - PMC - PubMed
    1. Yersal O., Barutca S. Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol. 2014;5(3):412. - PMC - PubMed
    1. Loibl S., Gianni L. HER2-positive breast cancer. Lancet. 2017;389(10087):2415–2429. - PubMed
    1. Wolff A.C., Hammond M.E.H., Allison K.H., et al. Human epidermal growth factor receptor 2 testing in breast cancer: American society of clinical oncology/college of american pathologists clinical practice guideline focused update. Arch Pathol Lab Med. 2018;142(11):1364–1382. - PubMed

LinkOut - more resources