Statistical inference for agreement between multiple raters on a binary scale

Sophie Vanbelle¹

Affiliations

PMID: 38233946
DOI: 10.1111/bmsp.12333

Statistical inference for agreement between multiple raters on a binary scale

Sophie Vanbelle. Br J Math Stat Psychol. 2024 May.

. 2024 May;77(2):245-260.

doi: 10.1111/bmsp.12333. Epub 2024 Jan 17.

Author

Sophie Vanbelle¹

Affiliation

¹ Department of Methodology and Statistics, CAPHRI, Maastricht university, Maastricht, The Netherlands.

PMID: 38233946
DOI: 10.1111/bmsp.12333

Abstract

Agreement studies often involve more than two raters or repeated measurements. In the presence of two raters, the proportion of agreement and of positive agreement are simple and popular agreement measures for binary scales. These measures were generalized to agreement studies involving more than two raters with statistical inference procedures proposed on an empirical basis. We present two alternatives. The first is a Wald confidence interval using standard errors obtained by the delta method. The second involves Bayesian statistical inference not requiring any specific Bayesian software. These new procedures show better statistical behaviour than the confidence intervals initially proposed. In addition, we provide analytical formulas to determine the minimum number of persons needed for a given number of raters when planning an agreement study. All methods are implemented in the R package simpleagree and the Shiny app simpleagree.

Keywords: confidence interval; credibility interval; dichotomous; raters; sample size.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest to declare.

References

REFERENCES

1. Agresti, A. (2012). Categorical data analysis. John Wiley & Sons.
1. Agresti, A., & Hitchcock, D. B. (2005). Bayesian inference for categorical data analysis. Statistical Methods and Applications, 14(3), 297–330.
1. Alvares, D., Armero, C., Forte, A., & Rubio, L. (2015). Dirichlet‐multinomial model: The impact of prior distributions. In Conference: 11th International Workshop on Objective Bayes Methodology at Valencia. Spain (Vol. 1).
1. Bloch, D. A., & Watson, G. S. (1967). A Bayesian study of the multinomial distribution. Annals of Mathematical Statistics, 38(5), 1423–1435.
1. Chamberlain, J., Rogers, P., Price, J., Ginks, S., Nathan, B., & Burn, I. (1975). Validity of clinical examination and mammography as screening tests for breast cancer. The Lancet, 306(7943), 1026–1030. Originally published as Volume 2, Issue 7943.

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Wiley

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Statistical inference for agreement between multiple raters on a binary scale

Affiliation

Statistical inference for agreement between multiple raters on a binary scale

Author

Affiliation

Abstract

Conflict of interest statement

References

REFERENCES

MeSH terms

LinkOut - more resources

Full Text Sources