Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar;53(3):250-263.
doi: 10.1111/medu.13783. Epub 2018 Dec 21.

Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Affiliations

Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Peter Yeates et al. Med Educ. 2019 Mar.

Abstract

Background: Although averaging across multiple examiners' judgements reduces unwanted overall score variability in objective structured clinical examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students' scores.

Methods: We developed video-based examiner score comparison and adjustment (VESCA): volunteer students were filmed 'live' on 10 out of 12 OSCE stations. Following the examination, examiners additionally scored station-specific common-comparator videos, producing partial crossing between examiner cohorts. Many-facet Rasch modelling and linear mixed modelling were used to estimate and adjust for examiner-cohort effects on students' scores.

Results: After accounting for students' ability, examiner cohorts differed substantially in their stringency or leniency (maximal global score difference of 0.47 out of 7.0 [Cohen's d = 0.96]; maximal total percentage score difference of 5.7% [Cohen's d = 1.06] for the same student ability by different examiner cohorts). Corresponding adjustment of students' global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whereas 8.6-9.5% students' scores were altered by at least 0.5 standard deviations of student ability.

Conclusions: Despite typical reliability, the examiner cohort that students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and adjust for potential systematic differences in scoring patterns that could exist between locations in distributed or national OSCE examinations, thereby ensuring equivalence and fairness.

Keywords: OSCEs; assessment; assessor variability; psychometrics.

PubMed Disclaimer

Conflict of interest statement

none declared.

Figures

Figure 1
Figure 1
Bland–Altman plots of global score (left) and total percentage score (right); positive difference indicates live scored higher than video. Bold dotted lines represent the mean difference and 95% confidence intervals [CIs] (limits of agreement) of the differences, and the light dotted lines represent the 95% CIs for these values
Figure 2
Figure 2
Wright map showing relative influence of students, stations and examiner cohorts on overall global scores
Figure 3
Figure 3
Diagram showing the relative influence of students, stations and examiner cohorts on total percentage scores
Figure 4
Figure 4
Plot of raw versus model‐adjusted overall global scores for individual students for global and total percentage scores. Note: students indicated by triangles (▴) changed from fail to pass when scores were adjusted; students indicated by diamonds (♦) changed from pass to fail when scores were adjusted; students indicated by squares (■) passed under both conditions; and students indicated by circles (●) failed under both conditions

References

    1. Watling CJ. Unfulfilled promise, untapped potential: feedback at the crossroads. Med Teach 2014;36:692–7. - PubMed
    1. Wass V, van der Vleuten CPM, Shatzer J, Jones R. Assessment of clinical competence. Lancet 2001;357 (9260):945–9. - PubMed
    1. Schuwirth LWT, van der Vleuten CPM. Programmatic assessment: from assessment of learning to assessment for learning. Med Teach 2011;33 (6):478–85. - PubMed
    1. Ten Cate O. Entrustability of professional activities and competency‐based training. Med Educ 2005;39 (12):1176–7. - PubMed
    1. Ginsburg S, van der Vleuten CPM, Eva KW, Lingard L. Cracking the code: residents’ interpretations of written assessment comments. Med Educ 2017;51 (4):401–10. - PubMed

MeSH terms