Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb;20(2):134-145.
doi: 10.1016/j.jacr.2022.05.022. Epub 2022 Jul 31.

The Low Rate of Adherence to Checklist for Artificial Intelligence in Medical Imaging Criteria Among Published Prostate MRI Artificial Intelligence Algorithms

Affiliations
Review

The Low Rate of Adherence to Checklist for Artificial Intelligence in Medical Imaging Criteria Among Published Prostate MRI Artificial Intelligence Algorithms

Mason J Belue et al. J Am Coll Radiol. 2023 Feb.

Abstract

Objective: To determine the rigor, generalizability, and reproducibility of published classification and detection artificial intelligence (AI) models for prostate cancer (PCa) on MRI using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) guidelines, a 42-item checklist that is considered a measure of best practice for presenting and reviewing medical imaging AI research.

Materials and methods: This review searched English literature for studies proposing PCa AI detection and classification models on MRI. Each study was evaluated with the CLAIM checklist. The additional outcomes for which data were sought included measures of AI model performance (eg, area under the curve [AUC], sensitivity, specificity, free-response operating characteristic curves), training and validation and testing group sample size, AI approach, detection versus classification AI, public data set utilization, MRI sequences used, and definition of gold standard for ground truth. The percentage of CLAIM checklist fulfillment was used to stratify studies into quartiles. Wilcoxon's rank-sum test was used for pair-wise comparisons.

Results: In all, 75 studies were identified, and 53 studies qualified for analysis. The original CLAIM items that most studies did not fulfill includes item 12 (77% no): de-identification methods; item 13 (68% no): handling missing data; item 15 (47% no): rationale for choosing ground truth reference standard; item 18 (55% no): measurements of inter- and intrareader variability; item 31 (60% no): inclusion of validated interpretability maps; item 37 (92% no): inclusion of failure analysis to elucidate AI model weaknesses. An AUC score versus percentage CLAIM fulfillment quartile revealed a significant difference of the mean AUC scores between quartile 1 versus quartile 2 (0.78 versus 0.86, P = .034) and quartile 1 versus quartile 4 (0.78 versus 0.89, P = .003) scores. Based on additional information and outcome metrics gathered in this study, additional measures of best practice are defined. These new items include disclosure of public dataset usage, ground truth definition in comparison to other referenced works in the defined task, and sample size power calculation.

Conclusion: A large proportion of AI studies do not fulfill key items in CLAIM guidelines within their methods and results sections. The percentage of CLAIM checklist fulfillment is weakly associated with improved AI model performance. Additions or supplementations to CLAIM are recommended to improve publishing standards and aid reviewers in determining study rigor.

Keywords: AI; CLAIM; classification; detection; prostate cancer; study rigor.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflict of interest.

Figures

Figure 1:
Figure 1:
Flow diagram of paper/study selection
Figure 2:
Figure 2:
Total Paper Performance per 42 CLAIM items. “Yes” was given if the specific CLAIM number was either included within the text or within supplementary material of the study. “No” was given if the CLAIM number couldn’t be found and “N/A” was given if the CLAIM number didn’t apply to a particular study.
Figure 3:
Figure 3:
Overall results by CLAIM section (represented by percentage out of 100) across all 53 studies.
Figure 4:
Figure 4:
Overall results by CLAIM section stratified by Method (machine learning (A) vs deep learning (B))
Figure 5:
Figure 5:
Impact of CLAIM fulfillment on AUC Score Stratified by CLAIM Binary Quartiles (Excluding N/A).

Similar articles

Cited by

References

    1. Key Statistics for Prostate Cancer | Prostate Cancer Facts [Internet]. Available from: https://www.cancer.org/cancer/prostate-cancer/about/key-statistics.html
    1. Harmon SA, Tuncer S, Sanford T, Choyke PL, Türkbey B. Artificial intelligence at the intersection of pathology and radiology in prostate cancer. Diagn Interv Radiol. 2019. May;25(3):183–8. - PMC - PubMed
    1. Kwon D, Reis IM, Breto AL, Tschudi Y, Gautney N, Zavala-Romero O, et al. Classification of suspicious lesions on prostate multiparametric MRI using machine learning. J Med Imaging Bellingham Wash. 2018. Jul;5(3):034502. - PMC - PubMed
    1. Gaur S, Lay N, Harmon SA, Doddakashi S, Mehralivand S, Argun B, et al.. Can computer-aided diagnosis assist in the identification of prostate cancer on prostate MRI? a multi-center, multi-reader investigation. Oncotarget. 2018. Sep 18;9(73):33804–17. - PMC - PubMed
    1. Wildeboer RR, van Sloun RJG, Wijkstra H, Mischi M. Artificial intelligence in multiparametric prostate cancer imaging with focus on deep-learning methods. Comput Methods Programs Biomed. 2020. Jun;189:105316. - PubMed

Publication types