Inter-rater reliability in clinical assessments: do examiner pairings influence candidate ratings?

Aileen Faherty¹, Tim Counihan², Thomas Kropmans², Yvonne Finn²

Affiliations

¹ National University of Ireland Galway, Galway, Ireland. aileen.faherty@nuigalway.ie.
² National University of Ireland Galway, Galway, Ireland.

PMID: 32393228
PMCID: PMC7212618
DOI: 10.1186/s12909-020-02009-4

Inter-rater reliability in clinical assessments: do examiner pairings influence candidate ratings?

Aileen Faherty et al. BMC Med Educ. 2020.

. 2020 May 11;20(1):147.

doi: 10.1186/s12909-020-02009-4.

Authors

Aileen Faherty¹, Tim Counihan², Thomas Kropmans², Yvonne Finn²

Affiliations

¹ National University of Ireland Galway, Galway, Ireland. aileen.faherty@nuigalway.ie.
² National University of Ireland Galway, Galway, Ireland.

PMID: 32393228
PMCID: PMC7212618
DOI: 10.1186/s12909-020-02009-4

Abstract

Background: The reliability of clinical assessments is known to vary considerably with inter-rater reliability a key contributor. Many of the mechanisms that contribute to inter-rater reliability however remain largely unexplained and unclear. While research in other fields suggests personality of raters can impact ratings, studies looking at personality factors in clinical assessments are few. Many schools use the approach of pairing examiners in clinical assessments and asking them to come to an agreed score. Little is known however, about what occurs when these paired examiners interact to generate a score. Could personality factors have an impact?

Methods: A fully-crossed design was employed with each participant examiner observing and scoring. A quasi-experimental research design used candidate's observed scores in a mock clinical assessment as the dependent variable. The independent variables were examiner numbers, demographics and personality with data collected by questionnaire. A purposeful sample of doctors who examine in the Final Medical examination at our institution was recruited.

Results: Variability between scores given by examiner pairs (N = 6) was less than the variability with individual examiners (N = 12). 75% of examiners (N = 9) scored below average for neuroticism and 75% also scored high or very high for extroversion. Two-thirds scored high or very high for conscientiousness. The higher an examiner's personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score.

Conclusions: The reliability of clinical assessments using paired examiners is comparable to assessments with single examiners. Personality factors, such as extroversion, may influence the magnitude of change in score an individual examiner agrees to when paired up with another examiner. Further studies on personality factors and examiner behaviour are needed to test associations and determine if personality testing has a role in reducing examiner variability.

Keywords: Clinical assessments; Examiner factors; Examiner variability; Reliability.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Box and Whisker Plots showing the Variability of Overall Scores for the Weak Performance using Single and Paired Examiners

**Fig. 2**
Box and Whisker Plots showing the Variability of Overall Scores for the Average Performance using Single and Paired Examiners

**Fig. 3**
Box and Whisker Plots showing the Variability of Overall Scores for the Good Performance using Single and Paired Examiners

See this image and copyright information in PMC

References

1. Downing S. Reliability: on the reproducibility of assessment data. Med Educ. 2004;38(9):1006–1012. doi: 10.1111/j.1365-2929.2004.01932.x. - DOI - PubMed
1. Crossley J, Davies H, Humphris G, Jolly B. Generalisability: a key to unlock professional assessment. Med Educ. 2002;36(10):972–978. doi: 10.1046/j.1365-2923.2002.01320.x. - DOI - PubMed
1. Crossley J, Russell J, Jolly B, Ricketts C, Roberts C, Schuwirth L, Norcini J. ‘I’m pickin'up good regressions': the governance of generalisability analyses. Med Educ. 2007;41(10):926–934. doi: 10.1111/j.1365-2923.2007.02843.x. - DOI - PubMed
1. Wilkinson T, Frampton C, Thompson-Fawcett M, Egan T. Objectivity in objective structured clinical examinations. Acad Med. 2003;78(2):219–223. doi: 10.1097/00001888-200302000-00021. - DOI - PubMed
1. McGill DA, Van der Vleuten CP, Clarke MJ. Supervisor assessment of clinical and professional competence of medical trainees: a reliability study using workplace data and a focused analytical literature review. Adv Health Sci Educ. 2011;16(3):405–425. doi: 10.1007/s10459-011-9296-1. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inter-rater reliability in clinical assessments: do examiner pairings influence candidate ratings?

Affiliations

Inter-rater reliability in clinical assessments: do examiner pairings influence candidate ratings?

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources