Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;96(1148):20220972.
doi: 10.1259/bjr.20220972. Epub 2023 Jun 29.

Interobserver variability studies in diagnostic imaging: a methodological systematic review

Affiliations

Interobserver variability studies in diagnostic imaging: a methodological systematic review

Laura Quinn et al. Br J Radiol. 2023 Aug.

Abstract

Objectives: To review the methodology of interobserver variability studies; including current practice and quality of conducting and reporting studies.

Methods: Interobserver variability studies between January 2019 and January 2020 were included; extracted data comprised of study characteristics, populations, variability measures, key results, and conclusions. Risk of bias was assessed using the COSMIN tool for assessing reliability and measurement error.

Results: Seventy-nine full-text studies were included covering various imaging tests and clinical areas. The median number of patients was 47 (IQR:23-88), and observers were 4 (IQR:2-7), with sample size justified in 12 (15%) studies. Most studies used static images (n = 75, 95%), where all observers interpreted images for all patients (n = 67, 85%). Intraclass correlation coefficients (ICC) (n = 41, 52%), Kappa (κ) statistics (n = 31, 39%) and percentage agreement (n = 15, 19%) were most commonly used. Interpretation of variability estimates often did not correspond with study conclusions. The COSMIN risk of bias tool gave a very good/adequate rating for 52 studies (66%) including any studies that used variability measures listed in the tool. For studies using static images, some study design standards were not applicable and did not contribute to the overall rating.

Conclusions: Interobserver variability studies have diverse study designs and methods, the impact of which requires further evaluation. Sample size for patients and observers was often small without justification. Most studies report ICC and κ values, which did not always coincide with the study conclusion. High ratings were assigned to many studies using the COSMIN risk of bias tool, with certain standards scored 'not applicable' when static images were used.

Advances in knowledge: The sample size for both patients and observers was often small without justification. For most studies, observers interpreted static images and did not evaluate the process of acquiring the imaging test, meaning it was not possible to assess many COSMIN risk of bias standards for studies with this design. Most studies reported intraclass correlation coefficient and κ statistics; study conclusions often did not correspond with results.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Flow diagram of systematic review
Figure 2.
Figure 2.
COSMIN Risk of Bias tool – Standards for study design. N/A – Standard is not applicable for study.
Figure 3.
Figure 3.
Example of the effect of marginal probabilities on κ statistics using a hypothetical example looking at the interobserver variability of an imaging test to detect the presence of a condition.

References

    1. Itri JN, Tappouni RR, McEachern RO, Pesch AJ, Patel SH. Fundamentals of diagnostic error in imaging. RadioGraphics 2018; 38: 1845–65. doi: 10.1148/rg.2018180021 - DOI - PubMed
    1. Bruno MA, Walker EA, Abujudeh HH. Understanding and confronting our mistakes: the epidemiology of error in Radiology and strategies for error reduction. Radiographics 2015; 35: 1668–76. doi: 10.1148/rg.2015150023 - DOI - PubMed
    1. Sardanelli F, Di Leo G. Biostatistics for Radiologists . Milano: Springer Science & Business Media; 2009. doi: 10.1007/978-88-470-1133-5 - DOI
    1. Fraser CG, Harris EK. Generation and application of data on biological variation in clinical chemistry. Crit Rev Clin Lab Sci 1989; 27: 409–37. doi: 10.3109/10408368909106595 - DOI - PubMed
    1. de Vet HCW, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. Journal of Clinical Epidemiology 2006; 59: 1033–39. doi: 10.1016/j.jclinepi.2005.10.015 - DOI - PubMed

Publication types