Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct 19:6:52.
doi: 10.1186/1471-2288-6-52.

A system for rating the stability and strength of medical evidence

Affiliations

A system for rating the stability and strength of medical evidence

Jonathan R Treadwell et al. BMC Med Res Methodol. .

Abstract

Background: Methods for describing one's confidence in the available evidence are useful for end-users of evidence reviews. Analysts inevitably make judgments about the quality, quantity consistency, robustness, and magnitude of effects observed in the studies identified. The subjectivity of these judgments in several areas underscores the need for transparency in judgments.

Discussion: This paper introduces a new system for rating medical evidence. The system requires explicit judgments and provides explicit rules for balancing these judgments. Unlike other systems for rating the strength of evidence, our system draws a distinction between two types of conclusions: quantitative and qualitative. A quantitative conclusion addresses the question, "How well does it work?", whereas a qualitative conclusion addresses the question, "Does it work?" In our system, quantitative conclusions are tied to stability ratings, and qualitative conclusions are tied to strength ratings. Our system emphasizes extensive a priori criteria for judgments to reduce the potential for bias. Further, the system makes explicit the impact of heterogeneity testing, meta-analysis, and sensitivity analyses on evidence ratings. This article provides details of our system, including graphical depictions of how the numerous judgments that an analyst makes can be combined. We also describe two worked examples of how the system can be applied to both interventional and diagnostic technologies.

Summary: Although explicit judgments and formal combination rules are two important steps on the path to a comprehensive system for rating medical evidence, many additional steps must also be taken. Foremost among these are the distinction between quantitative and qualitative conclusions, an extensive set of a priori criteria for making judgments, and the direct impact of analytic results on evidence ratings. These attributes form the basis for a logically consistent system that can improve the usefulness of evidence reviews.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Entry Into System.
Figure 2
Figure 2
Overview of the High Quality Arm.
Figure 3
Figure 3
High Quality Arm: Homogeneous Data.
Figure 4
Figure 4
High Quality Arm: Heterogeneous Data.
Figure 5
Figure 5
High Quality Arm: Small Evidence Base.
Figure 6
Figure 6
Forest Plot Demonstrating the Quantitative/Qualitative Distinction. Plot showing the results of 14 randomized trials that compared drug-eluting stents (DES) to bare metal stents (BMS) and reported the percentages of patients who underwent target lesion revascularization after stent implantation. Sizes of the squares are proportional to study size, and 95% confidence intervals are shown as horizontal lines. There was unexplained heterogeneity among the trial results, so we did not estimate the size of the difference between groups. However, the random-effects meta-analytic confidence interval at the bottom of the plot showed the summary statistic was statistically significant.
Figure 7
Figure 7
Example of a Qualitative Robustness Test. Cumulative meta-analytic-test of the qualitative of seven randomized trials that compared pharmacotherapeutic treatment for bulimia to placebo and reported mean purging frequency. We performed a cumulative meta-analysis in which the study with the largest weight was entered first (the topmost horizontal segment in the plot), and then the study with the next largest weight was entered (the second one from the top), etc. Each horizontal segment in the plot is a 95% confidence interval around a random-effects summary Hedges' d, a standardized mean difference. (The point estimates are not shown to clarify that the analysis focuses only on confidence intervals, not point estimates). In each of the last five analyses, the effect was statistically significant in the same direction. This met our a priori definition of qualitative robustness, which was that the qualitative conclusion must have remained the same after each of the last three or more studies were added.
Figure 8
Figure 8
Informative and Non-Informative Effect Sizes. This figure is adapted from Armitage and Berry.[23] Each open diamond denotes a hypothetical meta-analytic summary statistic, and the horizontal segments denote 95% confidence intervals. The dashed vertical line indicates the effect size that was determined a priori to represent the minimum effect size that is considered clinically important. A meta-analytic summary statistic is considered informative if its confidence interval either excludes 0 or excludes a clinically important effect (or both). Thus, meta-analyses A through D each show informative results, whereas meta-analysis E shows a non-informative result.

References

    1. Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, Atkins D. Current methods of the U.S. Preventive Services Task Force. A review of the process. Am J Prev Med. 2001;20:21–35. doi: 10.1016/S0749-3797(01)00261-6. - DOI - PubMed
    1. Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, Guyatt GH, Harbour RT, Haugh MC, Henry D, Hill S, Jaeschke R, Leng G, Liberati A, Magrini N, Mason J, Middleton P, Mrukowicz J, O'Connell D, Oxman AD, Phillips B, Schunemann HJ, Edejer TT, Varonen H, Vist GE, Williams JW, Jr, Zaza S. Grading quality of evidence and strength of recommendations. BMJ. 328:1490. http://bmj.bmjjournals.com/cgi/reprint/328/7454/1490 2004 Jun 19; - PMC - PubMed
    1. Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, Liberati A, O'Connell D, Oxman AD, Phillips B, Schunemann H, Edejer TT, Vist GE, Williams JW, Jr, GRADE Working Group Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches The GRADE Working Group. BMC Health Serv Res. 4:38. doi: 10.1186/1472-6963-4-38. 2004 Dec 22; - DOI - PMC - PubMed
    1. Atkins D, Briss PA, Eccles M, Flottorp S, Guyatt GH, Harbour RT, Hill S, Jaeschke R, Liberati A, Magrini N, Mason J, O'Connell D, Oxman AD, Phillips B, Schunemann H, Edejer TT, Vist GE, Williams JW, Jr, GRADE Working Group Systems for grading the quality of evidence and the strength of recommendations II: pilot study of a new system. BMC Health Serv Res. 5:25. doi: 10.1186/1472-6963-5-25. 2005 Mar 23; - DOI - PMC - PubMed
    1. Guyatt G, Gutterman D, Baumann MH, Addrizzo-Harris D, Hylek EM, Phillips B, Raskob G, Lewis SZ, Schunemann H. Grading strength of recommendations and quality of evidence in clinical guidelines: report from an american college of chest physicians task force. Chest. 2006;129:174–81. doi: 10.1378/chest.129.1.174. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources