Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2025 Mar;31(3):917-924.
doi: 10.1038/s41591-024-03408-6. Epub 2025 Jan 7.

Nationwide real-world implementation of AI for cancer detection in population-based mammography screening

Affiliations
Observational Study

Nationwide real-world implementation of AI for cancer detection in population-based mammography screening

Nora Eisemann et al. Nat Med. 2025 Mar.

Abstract

Artificial intelligence (AI) in mammography screening has shown promise in retrospective evaluations, but few prospective studies exist. PRAIM is an observational, multicenter, real-world, noninferiority, implementation study comparing the performance of AI-supported double reading to standard double reading (without AI) among women (50-69 years old) undergoing organized mammography screening at 12 sites in Germany. Radiologists in this study voluntarily chose whether to use the AI system. From July 2021 to February 2023, a total of 463,094 women were screened (260,739 with AI support) by 119 radiologists. Radiologists in the AI-supported screening group achieved a breast cancer detection rate of 6.7 per 1,000, which was 17.6% (95% confidence interval: +5.7%, +30.8%) higher than and statistically superior to the rate (5.7 per 1,000) achieved in the control group. The recall rate in the AI group was 37.4 per 1,000, which was lower than and noninferior to that (38.3 per 1,000) in the control group (percentage difference: -2.5% (-6.5%, +1.7%)). The positive predictive value (PPV) of recall was 17.9% in the AI group compared to 14.9% in the control group. The PPV of biopsy was 64.5% in the AI group versus 59.2% in the control group. Compared to standard double reading, AI-supported double reading was associated with a higher breast cancer detection rate without negatively affecting the recall rate, strongly indicating that AI can improve mammography screening metrics.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The study was funded by Vara. Vara was involved in the study design, collection and interpretation of data, and writing of the report. All authors had access to all the data and were responsible for the decision to submit the paper. S.B., T.M. and C.L. are current employees of Vara with stock options as part of the standard compensation package. G.H., R.R., T.G., T.T. and T.W.V. actively participated in the study as radiologists and as customers of Vara. T.T. received speaker fees from Vara. A.K. received general consulting and speaker fees from Vara. K.S.-L. received consulting fees from Hologic. S.H.-K. has research cooperations with iCAD and ScreenPoint (no payments). The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study profile.
The flowchart shows the inclusion of study participants and their assignment into groups.
Extended Data Fig. 1
Extended Data Fig. 1. AI-supported Viewer Screenshots.
a) The screenshot shows a worklist with subset of examinations tagged as normal. Radiologists can also choose to read only normal examinations or only not normal examinations (‘potentially suspicious’). Names and dates of birth are not real and were sampled randomly from a list of common first and last names. b) and c) When radiologists assess a case as normal (BI-RADS 1 or 2) but the safety net triggered, an alert is shown (b) and a suspicious region is highlighted in the viewer (c), asking the radiologists to reconsider.
Extended Data Fig. 2
Extended Data Fig. 2. Causal Graph (Directed Acyclic Graph, DAG).
Causal graph of assumed relationships between intervention, characteristics of the radiologists and the women screened, and the endpoint (breast cancer detection; recall; consensus conference; biopsy). Before being included in the DAG, possible direct paths between all variables considered were evaluated for plausibility based on theory, domain knowledge and previous empirical evidence. BC: Breast cancer; AI: Artificial Intelligence. Box: observed variable; round and dashed: latent variable.
Extended Data Fig. 3
Extended Data Fig. 3. Propensity Scores and Weights.
a) Distribution of propensity scores (for being in AI group), stratified by being in control group or in AI group. b) Extent to which screening exams with a given propensity score (for being in AI group) contribute to the overall sample weight.
Extended Data Fig. 4
Extended Data Fig. 4. Reading Time.
Reading times in the AI group for normal, safety net, and unconfident predictions. On average, examinations that were tagged as normal were read more quickly, with a median reading time of 16 seconds, compared to unclassified examinations, which had a median reading time of 30 seconds, and safety net examinations, which had a median reading time of 99 seconds.
Extended Data Fig. 5
Extended Data Fig. 5. Example Examinations.
a) This cancer was only diagnosed because of the safety net activation. Neither reader initially saw the invasive carcinoma (BI-RADS 1/2), but both changed their assessment to BI-RADS 4 A after the safety net was displayed. The MLO view of the right breast shows an architectural distortion. Ultrasound during recall identified highly suspicious malignant findings. Histology: Invasive breast cancer, no special type, pT1b (9 mm) pre-therapy, pT1c (19 mm) post-op, N0, M0, G2. b) This examination was classified as ‘normal’ by the AI. Both readers used the AI-supported viewer and overruled the AI (BI-RADS 4B and BI-RADS 4 A respectively). The MLO view of the right breast shows a mass. Histology: Invasive breast cancer, no special type, pT1b (9 mm) pre-therapy, ypT1c (12 mm) post-op, N0, M0, G2.
Extended Data Fig. 6
Extended Data Fig. 6. Bias due to self-selection of radiologists to AI or control group.
Excel file illustrating the effect of the reading behaviour on measured unadjusted breast cancer detection rate across study groups, even if breast cancer detection rate were identical in both study groups. The figure shows that even a minor adoption rate difference from 61.7% to 66.8% would lead to an unadjusted measured breast cancer detection rate difference of 13.6% (bias). Therefore, it was necessary to control for this reading behavior in the main analysis. The Excel file is available for further analysis (10.5281/zenodo.10822135).

References

    1. Independent UK Panel on Breast Cancer Screening. The benefits and harms of breast cancer screening: an independent review. Lancet380, 1778–1786 (2012). - PubMed
    1. Katalinic, A., Eisemann, N., Kraywinkel, K., Noftz, M. R. & Hübner, J. Breast cancer incidence and mortality before and after implementation of the German mammography screening program. Int. J. Cancer147, 709–718 (2020). - PubMed
    1. Perry, N. et al. European Guidelines for Quality Assurance in Breast Cancer Screening and Diagnosis (Office for Official Publications of the European Communities, 2006).
    1. Schünemann, H. J. et al. Breast cancer screening and diagnosis: a synopsis of the European Breast Guidelines. Ann. Intern. Med.172, 46–56 (2020). - PubMed
    1. Gulland, A. Staff shortages are putting UK breast cancer screening ‘at risk’, survey finds. BMJ353, i2350 (2016). - PubMed

LinkOut - more resources