Agreement Analysis: What He Said, She Said Versus You Said
- PMID: 29677066
- DOI: 10.1213/ANE.0000000000002924
Agreement Analysis: What He Said, She Said Versus You Said
Abstract
Correlation and agreement are 2 concepts that are widely applied in the medical literature and clinical practice to assess for the presence and strength of an association. However, because correlation and agreement are conceptually distinct, they require the use of different statistics. Agreement is a concept that is closely related to but fundamentally different from and often confused with correlation. The idea of agreement refers to the notion of reproducibility of clinical evaluations or biomedical measurements. The intraclass correlation coefficient is a commonly applied measure of agreement for continuous data. The intraclass correlation coefficient can be validly applied specifically to assess intrarater reliability and interrater reliability. As its name implies, the Lin concordance correlation coefficient is another measure of agreement or concordance. In undertaking a comparison of a new measurement technique with an established one, it is necessary to determine whether they agree sufficiently for the new to replace the old. Bland and Altman demonstrated that using a correlation coefficient is not appropriate for assessing the interchangeability of 2 such measurement methods. They in turn described an alternative approach, the since widely applied graphical Bland-Altman Plot, which is based on a simple estimation of the mean and standard deviation of differences between measurements by the 2 methods. In reading a medical journal article that includes the interpretation of diagnostic tests and application of diagnostic criteria, attention is conventionally focused on aspects like sensitivity, specificity, predictive values, and likelihood ratios. However, if the clinicians who interpret the test cannot agree on its interpretation and resulting typically dichotomous or binary diagnosis, the test results will be of little practical use. Such agreement between observers (interobserver agreement) about a dichotomous or binary variable is often reported as the kappa statistic. Assessing the interrater agreement between observers, in the case of ordinal variables and data, also has important biomedical applicability. Typically, this situation calls for use of the Cohen weighted kappa. Questionnaires, psychometric scales, and diagnostic tests are widespread and increasingly used by not only researchers but also clinicians in their daily practice. It is essential that these questionnaires, scales, and diagnostic tests have a high degree of agreement between observers. It is therefore vital that biomedical researchers and clinicians apply the appropriate statistical measures of agreement to assess the reproducibility and quality of these measurement instruments and decision-making processes.
Similar articles
-
Interrater agreement and interrater reliability: key concepts, approaches, and applications.Res Social Adm Pharm. 2013 May-Jun;9(3):330-8. doi: 10.1016/j.sapharm.2012.04.004. Epub 2012 Jun 12. Res Social Adm Pharm. 2013. PMID: 22695215 Review.
-
Assessing interrater agreement on binary measurements via intraclass odds ratio.Biom J. 2016 Jul;58(4):962-73. doi: 10.1002/bimj.201500109. Epub 2016 Mar 14. Biom J. 2016. PMID: 26988408
-
Assessment of agreement of a quantitative variable: a new graphical approach.J Clin Epidemiol. 2003 Oct;56(10):963-7. doi: 10.1016/s0895-4356(03)00164-1. J Clin Epidemiol. 2003. PMID: 14568627
-
Assessing intrarater, interrater and test-retest reliability of continuous measurements.Stat Med. 2002 Nov 30;21(22):3431-46. doi: 10.1002/sim.1253. Stat Med. 2002. PMID: 12407682
-
Observer variation in the diagnosis of thyroid disorders. Criteria for and impact on diagnostic decision-making.Dan Med Bull. 2000 Nov;47(5):328-39. Dan Med Bull. 2000. PMID: 11155660 Review.
Cited by
-
The application of a workflow integrating the variable reproducibility and harmonizability of radiomic features on a phantom dataset.PLoS One. 2021 May 7;16(5):e0251147. doi: 10.1371/journal.pone.0251147. eCollection 2021. PLoS One. 2021. PMID: 33961646 Free PMC article.
-
Validity of a Smartphone Application in Calculating Measures of Heart Rate Variability.Sensors (Basel). 2022 Dec 15;22(24):9883. doi: 10.3390/s22249883. Sensors (Basel). 2022. PMID: 36560256 Free PMC article.
-
Test-Retest Reliability and Concurrent Validity of Photoplethysmography Finger Sensor to Collect Measures of Heart Rate Variability.Sports (Basel). 2025 Jan 22;13(2):29. doi: 10.3390/sports13020029. Sports (Basel). 2025. PMID: 39997960 Free PMC article.
-
Comparison of 3D T1-SPACE and DSA in evaluation of intracranial in-stent restenosis.Br J Radiol. 2021 Feb 1;94(1118):20190950. doi: 10.1259/bjr.20190950. Epub 2020 Dec 1. Br J Radiol. 2021. PMID: 33259233 Free PMC article.
-
MRI texture feature repeatability and image acquisition factor robustness, a phantom study and in silico study.Eur Radiol Exp. 2021 Jan 19;5(1):2. doi: 10.1186/s41747-020-00199-6. Eur Radiol Exp. 2021. PMID: 33462642 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials