Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 27;5(1):11.
doi: 10.1038/s41746-021-00544-y.

Quality assessment standards in artificial intelligence diagnostic accuracy systematic reviews: a meta-research study

Affiliations

Quality assessment standards in artificial intelligence diagnostic accuracy systematic reviews: a meta-research study

Shruti Jayakumar et al. NPJ Digit Med. .

Abstract

Artificial intelligence (AI) centred diagnostic systems are increasingly recognised as robust solutions in healthcare delivery pathways. In turn, there has been a concurrent rise in secondary research studies regarding these technologies in order to influence key clinical and policymaking decisions. It is therefore essential that these studies accurately appraise methodological quality and risk of bias within shortlisted trials and reports. In order to assess whether this critical step is performed, we undertook a meta-research study evaluating adherence to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool within AI diagnostic accuracy systematic reviews. A literature search was conducted on all studies published from 2000 to December 2020. Of 50 included reviews, 36 performed the quality assessment, of which 27 utilised the QUADAS-2 tool. Bias was reported across all four domains of QUADAS-2. Two hundred forty-three of 423 studies (57.5%) across all systematic reviews utilising QUADAS-2 reported a high or unclear risk of bias in the patient selection domain, 110 (26%) reported a high or unclear risk of bias in the index test domain, 121 (28.6%) in the reference standard domain and 157 (37.1%) in the flow and timing domain. This study demonstrates the incomplete uptake of quality assessment tools in reviews of AI-based diagnostic accuracy studies and highlights inconsistent reporting across all domains of quality assessment. Poor standards of reporting act as barriers to clinical implementation. The creation of an AI-specific extension for quality assessment tools of diagnostic accuracy AI studies may facilitate the safe translation of AI tools into clinical practice.

PubMed Disclaimer

Conflict of interest statement

HA and AD: HA is Chief Scientific Officer, Preemptive Medicine and Health Security, Flagship Pioneering, AD is Executive Chairman of Preemptive Medicine and Health Security, Flagship Pioneering.

Figures

Fig. 1
Fig. 1. PRISMA flow diagram for systematic literature search and study selection.
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Fig. 2
Fig. 2. Systematic reviews undertaking quality assessment and utilising QUADAS.
QUADAS Quality Assessment of Diagnostic Accuracy Studies.
Fig. 3
Fig. 3. Pie charts demonstrating the risk of bias among axial imaging studies, as assessed through QUADAS.
Low, high and unclear risks are shown for the four QUADAS categories: patient selection, reference standard, index test and flow and timing (panels a, b, c and d, respectively).
Fig. 4
Fig. 4. Pie charts demonstrating the risk of bias among non-axial imaging studies, as assessed through QUADAS.
Low, high and unclear risks are shown for the four QUADAS categories: patient selection, reference standard, index test and flow and timing (panels a, b, c and d, respectively).
Fig. 5
Fig. 5. Pie charts demonstrating the risk of bias among photographic images studies, as assessed through QUADAS.
Low, high and unclear risks are shown for the four QUADAS categories: patient selection, reference standard, index test and flow and timing (panels a, b, c and d, respectively).
Fig. 6
Fig. 6. Types of biases affecting quality and applicability of artificial intelligence-based diagnostic accuracy studies.
Biases are listed under the QUADAS domain they primarily affect.

References

    1. McKinney SM, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577:89–94. - PubMed
    1. De Fauw J, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 2018;24:1342–1350. - PubMed
    1. Yamada M, et al. Development of a real-time endoscopic image diagnosis support system using deep learning technology in colonoscopy. Sci. Rep. 2019;9:1–9. - PMC - PubMed
    1. Nagpal K, et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. npj Digit. Med. 2019;2:1–10. - PMC - PubMed
    1. Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: A tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Medical Research Methodology. 2003;3:1–13. - PMC - PubMed