Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Jul;25(7):879-887.
doi: 10.1016/S1470-2045(24)00220-1. Epub 2024 Jun 11.

Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study

Collaborators, Affiliations
Comparative Study

Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study

Anindo Saha et al. Lancet Oncol. 2024 Jul.

Abstract

Background: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale.

Methods: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341.

Findings: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001).

Interpretation: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system.

Funding: Health~Holland and EU Horizon 2020.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests NAO provides statistical consultation to Siemens Healthineers, Takeda, and Qure, and serves as a committee member of the Eastern Cooperative Oncology Group–American College of Radiology Imaging Network, the Tomosynthesis Mammographic Imaging Screening Trial, and the National Cancer Institute's Clinical Imaging Steering Committee (Bethesda, MD, USA). AB has been a consultant and advisor for Astellas and Bayer; board membership, officer, and trustee for Glactone Pharma, and LIDDS Pharma; has received lecture honoraria for Accord, Astellas, AstraZeneca, Bayer, Ipsen, Janssen, and Merck; has participated in trials run by Astellas, Ferring, and Janssen; and holds stock in Glactone Pharma, LIDDS Pharma, Noviga, and WntResearch. BvG holds stocks in and is a founder of Thirona. JKC has received research funding from GE Healthcare and Genentech and is the co-inventor of software that has been licensed to Siloam Vision. JKC has equity ownership in Siloam Vision. GS has been an advisory board member of Exact Imaging and Angiogenesis and has received lecture honorarium from Hitachi. OR has received funding for travel expenses from Philips Medical Systems. ARP has received research funding from Siemens Healthineers, holds stocks in Lucida Medical, and has received lecture honoraria for Siemens Healthineers and Bayer. HH has received research funding from Siemens Healthineers and Canon Medical Systems. GV has been a clinical advisory board member of AGFA Healthcare. VK has received lecture honoraria on prostate cancer diagnosis from the European Association of Urology and Singapore Urology Association and has received research funding from Prostate Cancer UK and the John Black Charitable Foundation. DB has received lecture honorarium from Bayer Vital and holds stocks in NVIDIA, Microsoft, and MSCI-World ETF. RvdB has been an advisory board member for Janssen; has received lecture honoraria from Amgen, Astellas, Ipsen, Janssen, and MSD; has received research support from Astellas and Janssen; and has participated in trials run by Janssen. All other authors declare no competing interests.

Figures

Figure:
Figure:. Performance of the AI system at clinically significant prostate cancer diagnosis in the hidden testing cohort
(A) Receiver operating characteristic curves of the AI system and the pool of 62 radiologists, considering the subset of 400 testing cases used to facilitate the reader study. Light grey circle, star, and triangle markers indicate the PI-RADS operating points of each individual radiologist. The diagonal dashed line represents the receiver operating characteristic curve for a random classifier with an AUROC of 0·50. (B) Receiver operating characteristic curve of the AI system and the PI-RADS operating points of the radiology reads made during multidisciplinary routine practice, considering all 1000 testing cases. The diagonal dashed line represents the receiver operating characteristic curve for a random classifier with an AUROC of 0·50. (C) Difference in the AUROC metric between the AI system and the pool of 62 radiologists, considering the subset of 400 testing cases used to facilitate the reader study. (D) Difference in specificity when the threshold of the AI system was adjusted to match the same sensitivity (96·1%) as the PI-RADS 3 or greater operating point of the radiology reads made during multidisciplinary routine practice, considering all 1000 testing cases. AI=artificial intelligence. AUROC=area under the receiver operating characteristic curve. PI-RADS=Prostate Imaging Reporting and Data System.

Comment in

Similar articles

Cited by

References

    1. Hamdy FC, Donovan JL, Lane JA, et al. 15-Year outcomes after monitoring, surgery, or radiotherapy for localized prostate cancer. N Engl J Med 2023; 388: 1547–58. - PubMed
    1. Godtman RA, Holmberg E, Khatami A, Stranne J, Hugosson J. Outcome following active surveillance of men with screen-detected prostate cancer. Results from the Göteborg randomised population-based prostate cancer screening trial. Eur Urol 2013; 63: 101–07. - PubMed
    1. Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021; 71: 209–49. - PubMed
    1. Mottet N, van den Bergh RCN, Briers E, et al. EAU-EANM-ESTRO-ESUR-SIOG guidelines on prostate cancer—2020 update. Eur Urol 2021; 79: 243–62. - PubMed
    1. National Institute for Health and Care Excellence. NICE guidance—prostate cancer: diagnosis and management. BJU Int 2019; 124: 9–26. - PubMed

Associated data