Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Nynke Smidt¹, Anne W S Rutjes, Daniëlle A W M van der Windt, Raymond W J G Ostelo, Patrick M Bossuyt, Johannes B Reitsma, Lex M Bouter, Henrica C w de Vet

Affiliations

Affiliation

¹ Institute for Research in Extramural Medicine, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands. n.smidt@amc.uva.nl

PMID: 16539705
PMCID: PMC1522016
DOI: 10.1186/1471-2288-6-12

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Nynke Smidt et al. BMC Med Res Methodol. 2006.

. 2006 Mar 15:6:12.

doi: 10.1186/1471-2288-6-12.

Authors

Nynke Smidt¹, Anne W S Rutjes, Daniëlle A W M van der Windt, Raymond W J G Ostelo, Patrick M Bossuyt, Johannes B Reitsma, Lex M Bouter, Henrica C w de Vet

Affiliation

¹ Institute for Research in Extramural Medicine, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands. n.smidt@amc.uva.nl

PMID: 16539705
PMCID: PMC1522016
DOI: 10.1186/1471-2288-6-12

Abstract

Background: In January 2003, STAndards for the Reporting of Diagnostic accuracy studies (STARD) were published in a number of journals, to improve the quality of reporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibility of the items in the STARD statement.

Methods: Thirty-two diagnostic accuracy studies published in 2000 in medical journals with an impact factor of at least 4 were included. Two reviewers independently evaluated the quality of reporting of these studies using the 25 items of the STARD statement. A consensus evaluation was obtained by discussing and resolving disagreements between reviewers. Almost two years later, the same studies were evaluated by the same reviewers. For each item, percentages agreement and Cohen's kappa between first and second consensus assessments (inter-assessment) were calculated. Intraclass Correlation coefficients (ICC) were calculated to evaluate its reliability.

Results: The overall inter-assessment agreement for all items of the STARD statement was 85% (Cohen's kappa 0.70) and varied from 63% to 100% for individual items. The largest differences between the two assessments were found for the reporting of the rationale of the reference standard (kappa 0.37), number of included participants that underwent tests (kappa 0.28), distribution of the severity of the disease (kappa 0.23), a cross tabulation of the results of the index test by the results of the reference standard (kappa 0.33) and how indeterminate results, missing data and outliers were handled (kappa 0.25). Within and between reviewers, also large differences were observed for these items. The inter-assessment reliability of the STARD checklist was satisfactory (ICC = 0.79 [95% CI: 0.62 to 0.89]).

Conclusion: Although the overall reproducibility of the quality of reporting on diagnostic accuracy studies using the STARD statement was found to be good, substantial disagreements were found for specific items. These disagreements were not so much caused by differences in interpretation of the items by the reviewers but rather by difficulties in assessing the reporting of these items due to lack of clarity within the articles. Including a flow diagram in all reports on diagnostic accuracy studies would be very helpful in reducing confusion between readers and among reviewers.

PubMed Disclaimer

Figures

**Figure 1**
Overview of the design of the reproducibility study. * Papers were included in the pre-STARD evaluation, described elsewhere [18], † Four reviewers (AWSR, DAWMW, RWJGO, and HCWV) acted as second reviewer and each evaluated 8 articles. At the second assessment, the same reviewers evaluated the same studies, ‡ The first assessment was carried out together with the pre-STARD evaluation (March – May 2003), ¶The second assessment was carried out together with the post-STARD evaluation (January – March 2005).

**Figure 2**
Differences between first and second assessment for each article (n = 32), plotted against the mean value of both assessments for the total number of reported STARD items. Solid line: mean difference (0.39) between the two assessments, short striped lines: 95% Confidence Intervals (-0.4, 1.2) of systematic differences, long striped lines: 95% limits of agreement (-4.3, 5.0).

See this image and copyright information in PMC

References

1. Chan AW, Altman DG. Epidemiology and reporting of randomized trials published in PubMed Journals. Lancet. 2005;365:1159–1162. doi: 10.1016/S0140-6736(05)71879-1. - DOI - PubMed
1. Honest H, Khan KS. Reporting of measures of accuracy in systematic reviews of diagnostic literature. BMC Health Serv Res. 2002;2:4. doi: 10.1186/1472-6963-2-4. Epub 2002 Mar 7. - DOI - PMC - PubMed
1. Pocock SJ, Collier TJ, Dandreo KJ, De Stavola BL, Goldman MB, Kalish LA, Kasten LE, McCormack VA. Issues in the reporting of epidemiological studies: a survey of recent practice. British Medical Journal. 2004;329:883. - PMC - PubMed
1. Ernst E, Pittler MH. Assessment of therapeutic safety in systematic reviews: literature review. British Medical Journal. 2001;323:546. - PMC - PubMed
1. Ioannidis JPA, Lau J. Completeness of safety reporting in randomized trials. An evaluation of 7 medical areas. Journal of the American Medical Association. 2001;285:437–443. doi: 10.1001/jama.285.4.437. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Affiliation

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources