Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 26:14:204.
doi: 10.1186/1472-6920-14-204.

Implementing statistical equating for MRCP(UK) Parts 1 and 2

Affiliations

Implementing statistical equating for MRCP(UK) Parts 1 and 2

I C McManus et al. BMC Med Educ. .

Abstract

Background: The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change.

Methods: Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013.

Results: Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating.

Conclusions: Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pass rates at MRCP(UK) Part 1 in the three diets of each year from 2003 to 2013. UK graduates and non-UK graduates are shown separately, for all candidates and those at their first attempt. The blue arrows indicate the third diet of each year (see text), and the green arrows indicate the base form and the re-equating exercise. The red box indicates the period during which statistical equating was used.
Figure 2
Figure 2
Pass rates at MRCP(UK) Part 2 in the three diets of each year from 2003 to 2013. UK graduates and non-UK graduates are shown separately, for all candidates and those at their first attempt. The blue arrows indicate the third diet of each year (see text), and the green arrows indicate the base form and the re-equating exercise. The red box indicates the period during which statistical equating was used.
Figure 3
Figure 3
Example of DIF analysis for a Part 1 diet. The threshold (difficulty) for each item on the exam is calculated separately for UK candidates (ThresUK, horizontal axis) and non-UK candidates (ThresNonUK), with higher scores indicating more difficult questions. The significance of the difference between the two thresholds is calculated by Bilog and indicated by the colour of the points (see legend).
Figure 4
Figure 4
The numbers of items in each diet of the MRCP(UK) Part 1 exam showing DIF at different levels of significance (see legend). The red box indicates the period during which Statistical equating was being used. Note that DIF was only calculated for scoring items in the exam, and therefore numbers differ slightly between diets.
Figure 5
Figure 5
The numbers of items in each diet of the MRCP(UK) Part 2 exam showing DIF at different levels of significance (see legend). The red box indicates the period during which Statistical equating was being used. Note that the numbers of items in the Part 2 exam increased in the earlier years, and DIF was only calculated for scoring items in the exam, and therefore numbers differ between diets.
Figure 6
Figure 6
Mean threshold scores for anchor and non-anchor items by UK and non-UK candidates for the MRCP(UK) Part 1 exam. See text for further details.
Figure 7
Figure 7
Mean threshold scores for anchor and non-anchor items by UK and non-UK candidates for the MRCP(UK) Part 2 exam. See text for further details.
Figure 8
Figure 8
Pass rate for non-UK first-time takers of MRCP(UK) Part 1 plotted against diet. “2007” indicates the 2007/1 diet, with other minor tick marks indicating the second and third diets of the year. Open points are pre-statistical equating, and solid points post-statistical equating. The dashed line is a conventional linear regression, and the solid line is a loess curve.
Figure 9
Figure 9
Pass rate for non-UK first-time takers of MRCP(UK) Part 2 plotted against diet. “2007” indicates the 2007/1 diet, with other minor tick marks indicating the second and third diets of the year. Open points are pre-statistical equating, and solid points post-statistical equating. The dashed line is a conventional linear regression, and the solid line is a loess curve.

Similar articles

Cited by

References

    1. Postgraduate Medical Education and Training Board . Standards for Curricula and Assessment Systems. London: PMETB; 2008.
    1. Cizek GJ, Bunch MB. Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Thousand Oaks, California: Sage; 2007.
    1. American Educational Research Association, American Psychological Association, National Council on Measurement in Education . Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999.
    1. Case SM, Swanson DB. Constructing Written Test Questions for the Basic and Clinical Sciences. Philadelphia: National Board of Medical Examiners; 1996.
    1. Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and use. 4. Oxford: Oxford University Press; 2008.
Pre-publication history
    1. The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6920/14/204/prepub

MeSH terms

LinkOut - more resources