Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Clinical Trial
. 2026 Feb 2;17(1):2263.
doi: 10.1038/s41467-026-69212-7.

An interpretable AI system reduces false-positive MRI diagnoses by stratifying high-risk breast lesions

Affiliations
Clinical Trial

An interpretable AI system reduces false-positive MRI diagnoses by stratifying high-risk breast lesions

Yanting Liang et al. Nat Commun. .

Abstract

Breast cancer diagnosis using magnetic resonance imaging remains limited by high false-positive rates and substantial inter-reader variability, especially for lesions classified as Breast Imaging Reporting and Data System (BI-RADS) category 4, often leading to unnecessary biopsies. Here we show that the BI-RADS 4 Lesions Analysis System (BL4AS), an artificial intelligence system powered by foundation models and leveraging the rich spatiotemporal information of dynamic contrast-enhanced MRI, addresses these diagnostic challenges. Developed on a multicenter dataset of 2,803 lesions from 2,686 female patients, BL4AS demonstrates robust performance with areas under the curve of 0.892-0.930 and significantly outperforms radiologists in specificity (0.889 versus 0.491). BL4AS-assisted interpretation significantly improves diagnostic accuracy for both senior and junior radiologists, reducing inter-reader variability by 24.5% and decreasing false-positive rates by 27.3%. BL4AS further stratifies lesions into subcategories (4 A, 4B and 4 C) for refined risk assessment, offering a practical tool for precision breast cancer management.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of BL4AS designed in this study.
a Data acquisition. A multicenter breast MRI data set was constructed from three medical centers, encompassing 2,686 patients in total. b Workflow without and with BL4AS assistance. c Foundation model pre-training. The backbone network was initialized with weights from a foundation model pre-trained on 17,149 MRI volumes with 2.5 million slices. d Model development. BL4AS consisted of two tasks: lesion segmentation and classification task. e Model assessment. The performance of BL4AS was evaluated using receiver operating characteristic (ROC) curves, radar plots, and subgroup analyses. Gradient-weighted Class Activation Mapping was employed to visualize the diagnostic focus regions of the system. f Clinical applications. A reader study was conducted on a prospective test set (ChiCTR2400081831). Clinical management strategies and risk stratification capabilities of BL4AS were further analyzed to facilitate clinical translation of the system. Some illustrations were generated with figdraw.com (License ID: TOPUS0e999). Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Ablation experiments of BL4AS for breast lesion classification.
a Performance comparison of models with and without foundation model pre-training across validation (n = 397) and external test set A (n = 108) and external test set B (n = 666). The box shows the interquartile range (IQR), with the bottom edge at Q1 (25th percentile), middle line at the median, and top edge at Q3 (75th percentile); the whiskers extend to the most extreme points within 1.5 × IQR beyond the box edges. Radar plot showing the performance of Vision Transformer models with fusion of multiphase DCE images or with single-phase DCE images (Csub1, Cpeak, Csub2) in the validation set (b), external test set A (c), and external test set B (d), respectively. Radar plot illustrating the performance of Vision Transformer models using different feature integration methods (FeatAvg, FeatConv, FeatMLP and ImageConv) in the validation set (e), external test set A (f), and external test set B (g), respectively. AUC Area under the receiver operating characteristic curve, ACC Accuracy, SEN Sensitivity, SPE Specificity, PPV Positive predictive value, NPV Negative predictive value. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Performance of BL4AS and all readers for discrimination of malignant from benign breast lesions.
a ROC curves to evaluate diagnostic performance of BL4AS on different retrospective data sets. b Diagnostic performance of BL4AS compared with each reader in the prospective test set. Round dots indicate diagnostic sensitivities and specificities of individual readers, the red star indicates the pooled sensitivities and specificities of all junior readers, the green star indicates the pooled sensitivities and specificities of all senior readers, and the square indicates pooled sensitivities and specificities of all readers. c Diagnostic performance of readers alone and readers assisted by BL4AS. Dark-blue round dots indicate the sensitivity and specificity of the first diagnosis, and the light-blue round dots indicate sensitivity and specificity of second diagnosis with BL4AS assistance. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Comparison between readers’ independent diagnosis and readers’ diagnosis with the assistance of BL4AS in the prospective test set.
a, b The sensitivity and specificity of different groups of readers [all readers (n = 8), junior readers (n = 4), and senior readers (n = 4)] with or without BL4AS assistance. c, d The false negative rate (FNR) and false positive rate (FPR) of different groups of readers [all readers (n = 8), junior readers (n = 4), and senior readers (n = 4)] with or without BL4AS assistance. The agreement degree of pairs of readers with (e) and without BL4AS assistance (f) in the prospective test set. g Subgroup analysis in the prospective test set. The sensitivity and specificity with or without BL4AS-assisted diagnosis in different degrees of background parenchymal enhancement subgroup and different lesion sizes subgroup. In (a, b), the performance of model and readers were presented with 95% Confidence Intervals (95% CIs) based on 1000-time bootstrap. In (a, b), the box shows the interquartile range (IQR) containing 50% of the data, with the bottom edge at Q1 (25th percentile), middle line at the median, and top edge at Q3 (75th percentile); the whiskers extend to the most extreme points within 1.5 × IQR beyond the box edges. In (c, d), error bars represent the standard error of the mean and P values were calculated using the two-sided Wilcoxon signed-rank test without adjustment. NS not significant. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Examples of BL4AS prediction basis.
Heatmap analysis of four controversial cases. The red areas indicate a high level of model attention, while the blue areas indicate a low level of model attention. a Example of a woman with invasive lobular carcinoma. BL4AS correctly identified the lesion as malignant. In the second assessment conducted 4 weeks later with BL4AS assistance, four out of five readers who initially diagnosed it as benign changed their assessment to malignant. b Examples of a woman with invasive ductal carcinoma. BL4AS correctly identified the lesion as malignant. In the second assessment with BL4AS assistance, three out of five readers who initially diagnosed it as benign changed their assessment to malignant. c Examples of a woman with adenosis. AI correctly identified the lesion as benign. In the second assessment with BL4AS assistance, seven out of eight readers who initially diagnosed it as malignant changed their assessment to benign. d Example of a woman with intraductal papilloma. AI correctly identified the lesion as benign. In the second assessment with BL4AS assistance, four readers who initially diagnosed it as malignant changed their assessment to benign. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Clinical management strategies with BL4AS assistance.
a The performance and odds ratios of BL4AS across different probabilistic thresholds in the pooled training and validation sets. Two operating points are shown with symbols and are described in the text: high-NPV point (blue circle) and high-PPV point (purple circle). b Risk stratification and threshold analysis. Composite chart depicting the relationship between BL4AS predictions and actual results, along with a risk stratification in accordance with the ACR BI­RADS guideline in the pooled training and validation sets. c The distribution of the three risk classes (4A, 4B and 4 C) identified by the original report and BL4AS, and malignancy proportion in each category of risk levels in the external test set B. d The detection performance of original reports and BL4AS for breast cancer when the 4B and 4 C categories were classified as malignant in the external test set B. Source data are provided as a Source Data file.

References

    1. Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin.72, 7–33 (2022). - PubMed
    1. Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca. Cancer J. Clin.74, 229–263 (2024). - PubMed
    1. Saslow, D. et al. American cancer society guidelines for breast screening with MRI as an adjunct to mammography. Obstet. Gynecol. Surv.62, 458–460 (2007). - DOI - PubMed
    1. Leach, M. O. et al. Screening with magnetic resonance imaging and mammography of a UK population at high familial risk of breast cancer: a prospective multicentre cohort study (MARIBS). Lancet365, 1769–1778 (2005). - DOI - PubMed
    1. Mann, R. M. et al. Breast cancer screening in women with extremely dense breasts recommendations of the European Society of Breast Imaging (EUSOBI). Eur. Radiol.32, 4036–4045 (2022). - DOI - PMC - PubMed

Publication types

LinkOut - more resources