Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May;52(5):2999-3014.
doi: 10.1002/mp.17650. Epub 2025 Jan 30.

Hybrid transformer-based model for mammogram classification by integrating prior and current images

Affiliations

Hybrid transformer-based model for mammogram classification by integrating prior and current images

Afsana Ahsan Jeny et al. Med Phys. 2025 May.

Abstract

Background: Breast cancer screening via mammography plays a crucial role in early detection, significantly impacting women's health outcomes worldwide. However, the manual analysis of mammographic images is time-consuming and requires specialized expertise, presenting substantial challenges in medical practice.

Purpose: To address these challenges, we introduce a CNN-Transformer based model tailored for breast cancer classification through mammographic analysis. This model leverages both prior and current images to monitor temporal changes, aiming to enhance the efficiency and accuracy (ACC) of computer-aided diagnosis systems by mimicking the detailed examination process of radiologists.

Methods: In this study, our proposed model incorporates a novel integration of a position-wise feedforward network and multi-head self-attention, enabling it to detect abnormal or cancerous changes in mammograms over time. Additionally, the model employs positional encoding and channel attention methods to accurately highlight critical spatial features, thus precisely differentiating between normal and cancerous tissues. Our methodology utilizes focal loss (FL) to precisely address challenging instances that are difficult to classify, reducing false negatives and false positives to improve diagnostic ACC.

Results: We compared our model with eight baseline models; specifically, we utilized only current images for the single model ResNet50 while employing both prior and current images for the remaining models in terms of accuracy (ACC), sensitivity (SEN), precision (PRE), specificity (SPE), F1 score, and area under the curve (AUC). The results demonstrate that the proposed model outperforms the baseline models, achieving an ACC of 90.80%, SEN of 90.80%, PRE of 90.80%, SPE of 90.88%, an F1 score of 90.95%, and an AUC of 92.58%. The codes and related information are available at https://github.com/NabaviLab/PCTM.

Conclusions: Our proposed CNN-Transformer model integrates both prior and current images, removes long-range dependencies, and enhances its capability for nuanced classification. The application of FL reduces false positive rate (FPR) and false negative rates (FNR), improving both SEN and SPE. Furthermore, the model achieves the lowest false discovery rate and FNR across various abnormalities, including masses, calcification, and architectural distortions (ADs). These low error rates highlight the model's reliability and underscore its potential to improve early breast cancer detection in clinical practice.

Keywords: CNN; prior and current mammograms; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
Overall architecture of the proposed model. It utilizes a pair of ResNet networks with shared weights to process both the previous year and current year mammogram images simultaneously. Features extracted from both images (denoted as Fip and Fic) are then concatenated along with PE to capture temporal information. This combined feature set is passed through a TE, further processed by a Conv layer, and then through a CA module to refine feature representation. The output is finally directed through a FC layer for the binary classification task. Here, and represent summation and concatenation, respectively. CA, channel attention; Conv, convolutional; FC, fully connected; PE, positional encoding; TE, transformer encoder.
FIGURE 2
FIGURE 2
The diagram outlines the proposed architecture of a TE block, detailing the MSA mechanism with position‐wise FFN and individual self‐attention heads. Part (a) shows the overall encoder structure, with layers for MSA, LN, and position‐wise FFN, plus RCs. Part (b) expands on the MSA component, illustrating the process of linear transformation of queries (Q), keys (K), and values (V) followed by concatenation. Part (c) zooms into a SA, depicting the sequence of operations: scaling, masking, softmax activation, and matrix multiplication to compute attention scores. FFN, feed‐forward networks; LN, layer normalization; MSA, multi‐head self‐attention; SA, single self‐attention head; TE, transformer encoder.
FIGURE 3
FIGURE 3
The figure illustrates the proposed CA module, which begins with GP and is then followed by two 1 × 1 Conv layers separated by a ReLU activation function. The module concludes with a sigmoid gate that adjusts the input features by element‐wise multiplication. CA, channel attention; Conv, convolutional; GP, global pooling.
FIGURE 4
FIGURE 4
UCHC Dataset Visualization showcasing pairs of mammogram images. The top row displays previous year mammograms, while the bottom row presents current year mammograms with cancerous tumors highlighted by yellow circles. UCHC, University of Connecticut Health Center.
FIGURE 5
FIGURE 5
Training workflow for the proposed model. (a) The model begins with a ResNet50 backbone that has been pretrained on the ImageNet dataset. This backbone is then further pretrained on the DDSM, CMMD, BDs2D, and VDM datasets. The pretrained weights are transmitted to the backbone of our proposed model, using the shared weights of ResNet. (b) The pretrained ResNet is further optimized by fine‐tuning it using paired mammograms from both the current and earlier years of the UCHC dataset, as part of our proposed model. BCS‐DBT, breast cancer screening‐digital breast tomosynthesis; BDs2D, BCS‐DBT; CMMD, Chinese mammography database; DDSM, digital database for screening mammography; UCHC, University of Connecticut Health Center.
FIGURE 6
FIGURE 6
Comparison of model performance across balanced accuracy, AUC‐PR, and Brier Score metrics.
FIGURE 7
FIGURE 7
The graph represents the ROC curves for various baseline models, comparing their TPR against FPR. FPR, false postive rates; ROC, receiver operating characteristic; TPR, true positive rates.
FIGURE 8
FIGURE 8
Visualization of Grad‐CAM heatmaps applied to prior and current mammogram images. The first two rows depict four cancer cases from four different patients, with red circles indicating specific cancerous areas, while the last row represents two normal cases from two other patients. Grad‐CAMs, gradient‐weighted class activation maps.

Similar articles

References

    1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA: Cancer Journal Clin. 2022;72:7‐33. - PubMed
    1. Pfeiffer RM, Webb‐Vargas Y, Wheeler W, Gail MH. Proportion of US trends in breast cancer incidence attributable to long‐term changes in risk factor distributions. Cancer Epidemiol Biomarkers Prev. 2018;27:1214‐1222. - PMC - PubMed
    1. Chen W, Zheng R, Zhang S, et al. Cancer incidence and mortality in China, 2013. Cancer Lett. 2017;401:63‐71. - PubMed
    1. Heer E, Harper A, Escandor N, Sung H, McCormack V, Fidler‐Benaoudia MM. Global burden and trends in premenopausal and postmenopausal breast cancer: a population‐based study. Lancet Global Health. 2020;8:e1027‐e1037. - PubMed
    1. Nickson C, Mason KE, English DR, Kavanagh AM. Mammographic screening and breast cancer mortality: a case–control study and meta‐analysis. Cancer Epidemiol Biomarkers Prev. 2012;21:1479‐1488. - PubMed

Grants and funding