Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 8;31(5):440-455.
doi: 10.4274/dir.2025.243182. Epub 2025 Feb 10.

Adherence to the Checklist for Artificial Intelligence in Medical Imaging (CLAIM): an umbrella review with a comprehensive two-level analysis

Affiliations

Adherence to the Checklist for Artificial Intelligence in Medical Imaging (CLAIM): an umbrella review with a comprehensive two-level analysis

Burak Koçak et al. Diagn Interv Radiol. .

Abstract

Purpose: To comprehensively assess Checklist for Artificial Intelligence in Medical Imaging (CLAIM) adherence in medical imaging artificial intelligence (AI) literature by aggregating data from previous systematic and non-systematic reviews.

Methods: A systematic search of PubMed, Scopus, and Google Scholar identified reviews using the CLAIM to evaluate medical imaging AI studies. Reviews were analyzed at two levels: review level (33 reviews; 1,458 studies) and study level (421 unique studies from 15 reviews). The CLAIM adherence metrics (scores and compliance rates), baseline characteristics, factors influencing adherence, and critiques of the CLAIM were analyzed.

Results: A review-level analysis of 26 reviews (874 studies) found a weighted mean CLAIM score of 25 [standard deviation (SD): 4] and a median of 26 [interquartile range (IQR): 8; 25th-75th percentiles: 20-28]. In a separate review-level analysis involving 18 reviews (993 studies), the weighted mean CLAIM compliance was 63% (SD: 11%), with a median of 66% (IQR: 4%; 25th-75th percentiles: 63%-67%). A study-level analysis of 421 unique studies published between 1997 and 2024 found a median CLAIM score of 26 (IQR: 6; 25th-75th percentiles: 23-29) and a median compliance of 68% (IQR: 16%; 25th-75th percentiles: 59%-75%). Adherence was independently associated with the journal impact factor quartile, publication year, and specific radiology subfields. After guideline publication, CLAIM compliance improved (P = 0.004). Multiple readers provided an evaluation in 85% (28/33) of reviews, but only 11% (3/28) included a reliability analysis. An item-wise evaluation identified 11 underreported items (missing in ≥50% of studies). Among the 10 identified critiques, the most common were item inapplicability to diverse study types and subjective interpretations of fulfillment.

Conclusion: Our two-level analysis revealed considerable reporting gaps, underreported items, factors related to adherence, and common CLAIM critiques, providing actionable insights for researchers and journals to improve transparency, reproducibility, and reporting quality in AI studies.

Clinical significance: By combining data from systematic and non-systematic reviews on CLAIM adherence, our comprehensive findings may serve as targets to help researchers and journals improve transparency, reproducibility, and reporting quality in AI studies.

Keywords: Artificial intelligence; checklist; diagnostic imaging; machine learning; radiology.

PubMed Disclaimer

Conflict of interest statement

Burak Koçak, MD, is Section Editor in Diagnostic and Interventional Radiology. He had no involvement in the peer-review of this article and had no access to information regarding its peer-review. Other authors have nothing to disclose.

Figures

Figure 1
Figure 1
Identification of eligible studies for the review- and study-level analyses. CLAIM, Checklist for Artificial Intelligence in Medical Imaging.
Figure 2
Figure 2
Consideration of item applicability and resultant CLAIM adherence metrics in the review-level analysis, emphasizing the methodological variability among reviews evaluating CLAIM adherence. CLAIM, Checklist for Artificial Intelligence in Medical Imaging.
Figure 3
Figure 3
Tabulated bar charts for the study-level analysis of the median CLAIM score and compliance by journal, sorted by publication frequency (a) and CLAIM compliance (b). CLAIM, Checklist for Artificial Intelligence in Medical Imaging.
Figure 4
Figure 4
Study-level analysis of the publication year, CLAIM score, and compliance. Scatterplots with marginal distributions showing the correlation between the publication year and CLAIM score (a) and compliance (b). Combined box and violin plots illustrating the CLAIM score (c) and compliance (d) in relation to the release of the CLAIM guidelines in 2020. CLAIM, Checklist for Artificial Intelligence in Medical Imaging; CI, confidence interval.
Figure 5
Figure 5
Box plots for the study-level analysis of the CLAIM score (a) and compliance (b) by radiology subfield, with pairwise comparisons. The Kruskal–Wallis test showed statistically significant differences across all categories in both analyses (a, b). Only statistically significant pairwise comparisons are displayed for clarity. MS, multi-system; CLAIM, Checklist for Artificial Intelligence in Medical Imaging; CI, confidence interval.
Figure 6
Figure 6
Box plots for the study-level analysis of the CLAIM score (a) and compliance (b) by impact factor quartile, with pairwise comparisons. The Kruskal–Wallis test showed statistically significant differences across all categories in both analyses (a, b). Only statistically significant pairwise comparisons are displayed for clarity. CLAIM, Checklist for Artificial Intelligence in Medical Imaging; CI, confidence interval.
Figure 7
Figure 7
Item-wise analysis of the study-level data, ranked by compliance rates [calculated as follows: reported / (reported + not reported) × 100], considering the applicability of items. The compliance rates are based on the actual number of publications that reported or did not report each item. Note that item names have been abbreviated.
Figure 8
Figure 8
Eleven underreported items (i.e., missing in ≥50% of studies), categorized by relevant domains.

References

    1. Kocak B, Baessler B, Cuocolo R, Mercaldo N, Pinto Dos Santos D. Trends and statistics of artificial intelligence and radiomics research in radiology, nuclear medicine, and medical imaging: bibliometric analysis. Eur Radiol. 2023;33(11):7542–7555. - PubMed
    1. Nensa F, Pinto Dos Santos D, Dietzel M. Beyond accuracy: reproducibility must lead AI advances in radiology. Eur J Radiol. 2024;180:111703. - PubMed
    1. Beam AL, Manrai AK, Ghassemi M. Challenges to the reproducibility of machine learning models in health care. JAMA. 2020;323(4):305–306. doi: 10.1001/jama.2019.20866. - DOI - PMC - PubMed
    1. Klement W, El Emam K. Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation. J Med Internet Res. 2023;25:e48763. doi: 10.2196/48763. - DOI - PMC - PubMed
    1. Vasey B, Novak A, Ather S, Ibrahim M, McCulloch P. DECIDE-AI: a new reporting guideline and its relevance to artificial intelligence studies in radiology. Clin Radiol. 2023;78(2):130–136. doi: 10.1016/j.crad.2022.09.131. - DOI - PubMed

MeSH terms

LinkOut - more resources