Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 May;307(3):e221437.
doi: 10.1148/radiol.221437. Epub 2023 Mar 14.

How to Critically Appraise and Interpret Systematic Reviews and Meta-Analyses of Diagnostic Accuracy: A User Guide

Affiliations
Review

How to Critically Appraise and Interpret Systematic Reviews and Meta-Analyses of Diagnostic Accuracy: A User Guide

Robert A Frank et al. Radiology. 2023 May.

Abstract

Systematic reviews of diagnostic accuracy studies can provide the best available evidence to inform decisions regarding the use of a diagnostic test. In this guide, the authors provide a practical approach for clinicians to appraise diagnostic accuracy systematic reviews and apply their results to patient care. The first step is to identify an appropriate systematic review with a research question matching the clinical scenario. The user should evaluate the rigor of the review methods to evaluate its credibility (Did the review use clearly defined eligibility criteria, a comprehensive search strategy, structured data collection, risk of bias and applicability appraisal, and appropriate meta-analysis methods?). If the review is credible, the next step is to decide whether the diagnostic performance is adequate for clinical use (Do sensitivity and specificity estimates exceed the threshold that makes them useful in clinical practice? Are these estimates sufficiently precise? Is variability in the estimates of diagnostic accuracy across studies explained?). Diagnostic accuracy systematic reviews that are judged to be credible and provide diagnostic accuracy estimates with sufficient certainty and relevance are the most useful to inform patient care. This review discusses comparative, noncomparative, and emerging approaches to systematic reviews of diagnostic accuracy using a clinical scenario and examples based on recent publications.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: R.A.F. No relevant relationships. J.P.S. No relevant relationships. N.I. No relevant relationships. B.Y. No relevant relationships. M.H.M. No relevant relationships. R.M. No relevant relationships. M.L. No relevant relationships. P.M.B. Consultant for Radiology. Y.T. No relevant relationships. P.W. No relevant relationships. H.D. No relevant relationships. S.K.K. Grants from NIH/NCI, NIH/MIDCR; ACRQ, and Doris Duke Foundation; royalties from Wolters Kluwer; honoraria for editorial board work from ARRS and teaching from RSNA; Chair of American College of Radiology Steering Committee on Incidental Findings and American College of Radiology Expert Panel on Gynecological and Obstetrical Imaging. S.E. No relevant relationships. B.L. No relevant relationships. B.H. Patents planned, issued, or pending from Ottawa Hospital Research Institute. M.D.F.M. CIHR operating grant to institution; associate editor for Radiology.

Figures

None
Graphical abstract
Example of presentation of Quality Assessment of Diagnostic Accuracy
Studies 2 (QUADAS-2) result. Chart shows risk of bias and applicability summary
with authors’ judgments about each domain for each included study.
Reprinted, with permission, from reference 84.
Figure 1:
Example of presentation of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) result. Chart shows risk of bias and applicability summary with authors’ judgments about each domain for each included study. Reprinted, with permission, from reference .
Example of a coupled forest plot of sensitivity and specificity. Published
data from Duke et al (24) were re-analyzed by fitting a bivariate meta-analysis
using the lme4 package in R (version 4.2.2; The R Foundation for Statistical
Computing). This figure was generated using RevMan software (Cochrane;
https://training.cochrane.org/online-learning/core-software/revman#:~:text=ReviewManager%20(RevMan)%20is%20Cochrane's%20bespoke%20software%20for%20writing%20Cochrane%20Reviews).
Each study is identified by name of first author and year of publication, with
blue squares representing individual study point estimates and horizontal lines
indicating 95% CIs. For each primary study, values of sensitivity and
specificity with associated upper and lower limits of the 95% CIs are provided,
in addition to 2 × 2 data. Summary estimates of sensitivity and
specificity are 0.965 (95% CI: 0.948, 0.977) and 0.967 (95%CI: 0.949,0.979) (not
displayed in this forest plot). FN = false-negative, FP = false-negative, TN =
true-negative, TP = true-positive.
Figure 2:
Example of a coupled forest plot of sensitivity and specificity. Published data from Duke et al (24) were re-analyzed by fitting a bivariate meta-analysis using the lme4 package in R (version 4.2.2; The R Foundation for Statistical Computing). This figure was generated using RevMan software (Cochrane; https://training.cochrane.org/online-learning/core-software/revman#:~:text=ReviewManager%20(RevMan)%20is%20Cochrane's%20bespoke%20software%20for%20writing%20Cochrane%20Reviews). Each study is identified by name of first author and year of publication, with blue squares representing individual study point estimates and horizontal lines indicating 95% CIs. For each primary study, values of sensitivity and specificity with associated upper and lower limits of the 95% CIs are provided, in addition to 2 × 2 data. Summary estimates of sensitivity and specificity are 0.965 (95% CI: 0.948, 0.977) and 0.967 (95%CI: 0.949,0.979) (not displayed in this forest plot). FN = false-negative, FP = false-negative, TN = true-negative, TP = true-positive.
Examples of summary receiver operating characteristic (ROC) plots.
Published data from Duke et al (24) were re-analyzed by fitting a bivariate
meta-analysis using the lme4 package in R (The R Foundation for Statistical
Computing) (3) to obtain a (A) summary point and a (B) summary curve. Figures
were generated using RevMan software. The open circles represent accuracy
estimates for included primary studies, with the position relative to the y-axis
representing the sensitivity and the position relative to the x-axis
representing the false-positive rate (1 − specificity). Individual study
points are weighted using sample size. In A, the position of the solid black
circle in ROC space represents the summary estimate of sensitivity and
specificity (sensitivity = 0.965 [95% CI: 0.948, 0.977]; specificity = 0.967
[95% CI: 0.949, 0.979]). The dashed circle represents the 95% prediction region,
and the dotted line represents the 95% confidence region. In B, the solid line
represents the hierarchical summary ROC curve. The dashed line denotes an
uninformative ROC curve. Points along this curve correspond to where sensitivity
= 1 − specificity (ie, the true- and false-positive rates are
equal).
Figure 3:
Examples of summary receiver operating characteristic (ROC) plots. Published data from Duke et al (24) were re-analyzed by fitting a bivariate meta-analysis using the lme4 package in R (The R Foundation for Statistical Computing) (3) to obtain a (A) summary point and a (B) summary curve. Figures were generated using RevMan software. The open circles represent accuracy estimates for included primary studies, with the position relative to the y-axis representing the sensitivity and the position relative to the x-axis representing the false-positive rate (1 − specificity). Individual study points are weighted using sample size. In A, the position of the solid black circle in ROC space represents the summary estimate of sensitivity and specificity (sensitivity = 0.965 [95% CI: 0.948, 0.977]; specificity = 0.967 [95% CI: 0.949, 0.979]). The dashed circle represents the 95% prediction region, and the dotted line represents the 95% confidence region. In B, the solid line represents the hierarchical summary ROC curve. The dashed line denotes an uninformative ROC curve. Points along this curve correspond to where sensitivity = 1 − specificity (ie, the true- and false-positive rates are equal).
Example of summary receiver operating characteristic (ROC) plot with
studies demonstrating higher between-study variability and lower accuracy
estimates compared with those in Figure 3. The summary point is indicated by the
black circle. Individual studies are indicated by open circles (scale = study
sample size). The dotted border and the dashed border represent 95% confidence
regions and 95% prediction regions, respectively. The dashed diagonal line
denotes an uninformative ROC curve. Points along this curve correspond to where
sensitivity = 1 − specificity (ie, the true- and false-positive rates are
equal). Reprinted, under a CC BY license, from reference 38.
Figure 4:
Example of summary receiver operating characteristic (ROC) plot with studies demonstrating higher between-study variability and lower accuracy estimates compared with those in Figure 3. The summary point is indicated by the black circle. Individual studies are indicated by open circles (scale = study sample size). The dotted border and the dashed border represent 95% confidence regions and 95% prediction regions, respectively. The dashed diagonal line denotes an uninformative ROC curve. Points along this curve correspond to where sensitivity = 1 − specificity (ie, the true- and false-positive rates are equal). Reprinted, under a CC BY license, from reference .
Example of summary receiver operating characteristic (ROC) plot with an
intermediate degree of between-study variability compared with the previously
presented examples. Hierarchical summary ROC (HSROC) curve for
fracture-detection algorithms. The 95% prediction region is a visual
representation of between-study heterogeneity. Reprinted, with permission, from
reference 41.
Figure 5:
Example of summary receiver operating characteristic (ROC) plot with an intermediate degree of between-study variability compared with the previously presented examples. Hierarchical summary ROC (HSROC) curve for fracture-detection algorithms. The 95% prediction region is a visual representation of between-study heterogeneity. Reprinted, with permission, from reference .
Linked summary receiver operating characteristic (ROC) plot displays
direct comparisons of two index tests. Plot is from the study by Chan et al
(51), who compared the accuracy of chest US (CUS) and chest radiography
(CXR) in the diagnosis of pneumothorax in trauma patients in the emergency
department. Solid circles (summary points) represent the summary estimates
of sensitivity and specificity for chest US (black circle) and chest
radiography (red circle). Each summary point is surrounded by a dotted line
of the same color representing the 95% confidence region and a dashed line
of the same color representing the 95% prediction region. The dashed
diagonal line extending denotes an uninformative ROC curve. Points along
this curve correspond to where sensitivity = 1 − specificity (ie, the
true- and false-positive rates are equal). Reprinted, under a CC BY license,
from reference 51.
Figure 6:
Linked summary receiver operating characteristic (ROC) plot displays direct comparisons of two index tests. Plot is from the study by Chan et al (51), who compared the accuracy of chest US (CUS) and chest radiography (CXR) in the diagnosis of pneumothorax in trauma patients in the emergency department. Solid circles (summary points) represent the summary estimates of sensitivity and specificity for chest US (black circle) and chest radiography (red circle). Each summary point is surrounded by a dotted line of the same color representing the 95% confidence region and a dashed line of the same color representing the 95% prediction region. The dashed diagonal line extending denotes an uninformative ROC curve. Points along this curve correspond to where sensitivity = 1 − specificity (ie, the true- and false-positive rates are equal). Reprinted, under a CC BY license, from reference .

Similar articles

Cited by

References

    1. Salameh JP , Bossuyt PM , McGrath TA , et al. . Preferred reporting items for systematic review and meta-analysis of diagnostic test accuracy studies (PRISMA-DTA): explanation, elaboration, and checklist . BMJ 2020. ; 370 : m2632 . - PubMed
    1. Cohen JF , Deeks JJ , Hooft L , et al. . Preferred reporting items for journal and conference abstracts of systematic reviews and meta-analyses of diagnostic test accuracy studies (PRISMA-DTA for Abstracts): checklist, explanation, and elaboration . BMJ 2021. ; 372 : n265 . - PMC - PubMed
    1. Frank RA , McInnes MDF , Levine D , et al. . Are Study and Journal Characteristics Reliable Indicators of “Truth” in Imaging Research? Radiology 2018. ; 287 ( 1 ): 215 – 223 . - PubMed
    1. Higgins JPT , Thomas J . Chandler J , et al. , eds. Cochrane Handbook for Systematic Reviews of Interventions version 6.0 . (updated July 2019). Cochrane, 2019. https://training.cochrane.org/handbook .
    1. Patsopoulos NA , Analatos AA , Ioannidis JP . Relative citation impact of various study designs in the health sciences . JAMA 2005. ; 293 ( 19 ): 2362 – 2366 . - PubMed

LinkOut - more resources