Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Jun;35(6):3134-3143.
doi: 10.1007/s00330-024-11287-1. Epub 2024 Dec 19.

Evaluation of a deep learning prostate cancer detection system on biparametric MRI against radiological reading

Affiliations
Comparative Study

Evaluation of a deep learning prostate cancer detection system on biparametric MRI against radiological reading

Noëlie Debs et al. Eur Radiol. 2025 Jun.

Abstract

Objectives: This study aims to evaluate a deep learning pipeline for detecting clinically significant prostate cancer (csPCa), defined as Gleason Grade Group (GGG) ≥ 2, using biparametric MRI (bpMRI) and compare its performance with radiological reading.

Materials and methods: The training dataset included 4381 bpMRI cases (3800 positive and 581 negative) across three continents, with 80% annotated using PI-RADS and 20% with Gleason Scores. The testing set comprised 328 cases from the PROSTATEx dataset, including 34% positive (GGG ≥ 2) and 66% negative cases. A 3D nnU-Net was trained on bpMRI for lesion detection, evaluated using histopathology-based annotations, and assessed with patient- and lesion-level metrics, along with lesion volume, and GGG. The algorithm was compared to non-expert radiologists using multi-parametric MRI (mpMRI).

Results: The model achieved an AUC of 0.83 (95% CI: 0.80, 0.87). Lesion-level sensitivity was 0.85 (95% CI: 0.82, 0.94) at 0.5 False Positives per volume (FP/volume) and 0.88 (95% CI: 0.79, 0.92) at 1 FP/volume. Average Precision was 0.55 (95% CI: 0.46, 0.64). The model showed over 0.90 sensitivity for lesions larger than 650 mm³ and exceeded 0.85 across GGGs. It had higher true positive rates (TPRs) than radiologists equivalent FP rates, achieving TPRs of 0.93 and 0.79 compared to radiologists' 0.87 and 0.68 for PI-RADS ≥ 3 and PI-RADS ≥ 4 lesions (p ≤ 0.05).

Conclusion: The DL model showed strong performance in detecting csPCa on an independent test cohort, surpassing radiological interpretation and demonstrating AI's potential to improve diagnostic accuracy for non-expert radiologists. However, detecting small lesions remains challenging.

Key points: Question Current prostate cancer detection methods often do not involve non-expert radiologists, highlighting the need for more accurate deep learning approaches using biparametric MRI. Findings Our model outperforms radiologists significantly, showing consistent performance across Gleason Grade Groups and for medium to large lesions. Clinical relevance This AI model improves prostate detection accuracy in prostate imaging, serves as a benchmark with reference performance on a public dataset, and offers public PI-RADS annotations, enhancing transparency and facilitating further research and development.

Keywords: Deep learning; Magnetic resonance imaging; Neoplasm grading; Prostatic neoplasms; Radiologists.

PubMed Disclaimer

Conflict of interest statement

Compliance with ethical standards. Guarantor: The scientific guarantor of this publication is Noëlie Debs. Conflict of interest: The authors of this manuscript declare relationships with the following companies: Guerbet. Statistics and biometry: No complex statistical methods were necessary for this paper. Informed consent: Written informed consent was obtained from all subjects (patients) in this study. Ethical approval: Institutional Review Board approval was not required because this study is retrospective and does not require any new intervention or interaction with human subjects. Also, our data were de-identified, and there was no way to link it back to individual patients. Study subjects or cohorts overlap: None. Methodology: Retrospective Diagnostic or prognostic study Multicenter study

References

    1. Boesen L, Nørgaard N, Løgager V et al (2018) Assessment of the diagnostic accuracy of biparametric magnetic resonance imaging for prostate cancer in biopsy-naive men: the biparametric MRI for detection of prostate cancer (BIDOC) study. JAMA Netw Open 1:e180219. https://doi.org/10.1001/jamanetworkopen.2018.0219 - DOI - PubMed - PMC
    1. Turkbey B, Rosenkrantz AB, Haider MA et al (2019) Prostate Imaging Reporting and Data System version 2.1: 2019 update of Prostate Imaging Reporting and Data System version 2. Eur Urol 76:340–351. https://doi.org/10.1016/j.eururo.2019.02.033 - DOI - PubMed
    1. Saha A, Hosseinzadeh M, Huisman H (2021) End-to-end prostate cancer detection in bpMRI via 3D CNNs: effects of attention mechanisms, clinical priori and decoupled false positive reduction. Med Image Anal 73:102155. https://doi.org/10.1016/j.media.2021.102155 - DOI - PubMed
    1. Greer MD, Shih JH, Lay N et al (2019) Interreader variability of Prostate Imaging Reporting and Data System Version 2 in detecting and assessing prostate cancer lesions at prostate MRI. AJR Am J Roentgenol 212:1197–1205. https://doi.org/10.2214/AJR.18.20536 - DOI - PubMed - PMC
    1. Smith CP, Harmon SA, Barrett T et al (2019) Intra- and interreader reproducibility of PI-RADSv2: a multireader study. J Magn Reson Imaging 49:1694–1703. https://doi.org/10.1002/jmri.26555 - DOI - PubMed

Publication types

LinkOut - more resources