Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Peter Yeates^{1

2}, Natalie Cope¹, Ashley Hawarden³, Hannah Bradshaw³, Gareth McCray⁴, Matt Homer⁵

Affiliations

¹ Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK.
² Department of Acute Medicine, Fairfield General Hospital, Pennine Acute Hospitals NHS Trust, Bury, UK.
³ Royal Stoke Hospital, University Hospital of North Midlands NHS Trust, Stoke on Trent, UK.
⁴ Institute for Primary Care and Health Sciences, Keele University, Keele, UK.
⁵ School of Education, University of Leeds, Leeds, UK.

PMID: 30575092
PMCID: PMC6519246
DOI: 10.1111/medu.13783

Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Peter Yeates et al. Med Educ. 2019 Mar.

. 2019 Mar;53(3):250-263.

doi: 10.1111/medu.13783. Epub 2018 Dec 21.

Authors

Peter Yeates^{1

2}, Natalie Cope¹, Ashley Hawarden³, Hannah Bradshaw³, Gareth McCray⁴, Matt Homer⁵

Affiliations

¹ Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK.
² Department of Acute Medicine, Fairfield General Hospital, Pennine Acute Hospitals NHS Trust, Bury, UK.
³ Royal Stoke Hospital, University Hospital of North Midlands NHS Trust, Stoke on Trent, UK.
⁴ Institute for Primary Care and Health Sciences, Keele University, Keele, UK.
⁵ School of Education, University of Leeds, Leeds, UK.

PMID: 30575092
PMCID: PMC6519246
DOI: 10.1111/medu.13783

Abstract

Background: Although averaging across multiple examiners' judgements reduces unwanted overall score variability in objective structured clinical examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students' scores.

Methods: We developed video-based examiner score comparison and adjustment (VESCA): volunteer students were filmed 'live' on 10 out of 12 OSCE stations. Following the examination, examiners additionally scored station-specific common-comparator videos, producing partial crossing between examiner cohorts. Many-facet Rasch modelling and linear mixed modelling were used to estimate and adjust for examiner-cohort effects on students' scores.

Results: After accounting for students' ability, examiner cohorts differed substantially in their stringency or leniency (maximal global score difference of 0.47 out of 7.0 [Cohen's d = 0.96]; maximal total percentage score difference of 5.7% [Cohen's d = 1.06] for the same student ability by different examiner cohorts). Corresponding adjustment of students' global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whereas 8.6-9.5% students' scores were altered by at least 0.5 standard deviations of student ability.

Conclusions: Despite typical reliability, the examiner cohort that students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and adjust for potential systematic differences in scoring patterns that could exist between locations in distributed or national OSCE examinations, thereby ensuring equivalence and fairness.

Keywords: OSCEs; assessment; assessor variability; psychometrics.

PubMed Disclaimer

Conflict of interest statement

none declared.

Figures

**Figure 1**
Bland–Altman plots of global score (left) and total percentage score (right); positive difference indicates live scored higher than video. Bold dotted lines represent the mean difference and 95% confidence intervals [CIs] (limits of agreement) of the differences, and the light dotted lines represent the 95% CIs for these values

**Figure 2**
Wright map showing relative influence of students, stations and examiner cohorts on overall global scores

**Figure 3**
Diagram showing the relative influence of students, stations and examiner cohorts on total percentage scores

**Figure 4**
Plot of raw versus model‐adjusted overall global scores for individual students for global and total percentage scores. Note: students indicated by triangles (▴) changed from fail to pass when scores were adjusted; students indicated by diamonds (♦) changed from pass to fail when scores were adjusted; students indicated by squares (■) passed under both conditions; and students indicated by circles (●) failed under both conditions

See this image and copyright information in PMC

References

1. Watling CJ. Unfulfilled promise, untapped potential: feedback at the crossroads. Med Teach 2014;36:692–7. - PubMed
1. Wass V, van der Vleuten CPM, Shatzer J, Jones R. Assessment of clinical competence. Lancet 2001;357 (9260):945–9. - PubMed
1. Schuwirth LWT, van der Vleuten CPM. Programmatic assessment: from assessment of learning to assessment for learning. Med Teach 2011;33 (6):478–85. - PubMed
1. Ten Cate O. Entrustability of professional activities and competency‐based training. Med Educ 2005;39 (12):1176–7. - PubMed
1. Ginsburg S, van der Vleuten CPM, Eva KW, Lingard L. Cracking the code: residents’ interpretations of written assessment comments. Med Educ 2017;51 (4):401–10. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

CS-2017-17-001/DH_/Department of Health/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Affiliations

Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources