Implementing statistical equating for MRCP(UK) Parts 1 and 2
- PMID: 25257070
- PMCID: PMC4182791
- DOI: 10.1186/1472-6920-14-204
Implementing statistical equating for MRCP(UK) Parts 1 and 2
Abstract
Background: The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change.
Methods: Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013.
Results: Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating.
Conclusions: Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.
Figures









Similar articles
-
PLAB and UK graduates' performance on MRCP(UK) and MRCGP examinations: data linkage study.BMJ. 2014 Apr 17;348:g2621. doi: 10.1136/bmj.g2621. BMJ. 2014. PMID: 24742473 Free PMC article.
-
Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations.BMC Med Educ. 2013 Jul 30;13:103. doi: 10.1186/1472-6920-13-103. BMC Med Educ. 2013. PMID: 23899223 Free PMC article.
-
Performance at MRCP(UK): when should trainees sit examinations?Clin Med (Lond). 2013 Apr;13(2):166-9. doi: 10.7861/clinmedicine.13-2-166. Clin Med (Lond). 2013. PMID: 23681866 Free PMC article.
-
Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments.PLoS One. 2024 Jul 31;19(7):e0307372. doi: 10.1371/journal.pone.0307372. eCollection 2024. PLoS One. 2024. PMID: 39083455 Free PMC article.
-
MRCP(UK) PART 2 Clinical Examination (PACES): a review of the first four examination sessions (June 2001 - July 2002).Clin Med (Lond). 2003 Sep-Oct;3(5):452-9. doi: 10.7861/clinmedicine.3-5-452. Clin Med (Lond). 2003. PMID: 14601946 Free PMC article. Review.
Cited by
-
Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.BMC Med Educ. 2018 Apr 3;18(1):64. doi: 10.1186/s12909-018-1143-0. BMC Med Educ. 2018. PMID: 29615016 Free PMC article.
-
Are Exam Questions Known in Advance? Using Local Dependence to Detect Cheating.PLoS One. 2016 Dec 1;11(12):e0167545. doi: 10.1371/journal.pone.0167545. eCollection 2016. PLoS One. 2016. PMID: 27907190 Free PMC article.
-
Predictive validity of A-level grades and teacher-predicted grades in UK medical school applicants: a retrospective analysis of administrative data in a time of COVID-19.BMJ Open. 2021 Dec 16;11(12):e047354. doi: 10.1136/bmjopen-2020-047354. BMJ Open. 2021. PMID: 34916308 Free PMC article.
-
Exploring the use of Rasch modelling in "common content" items for multi-site and multi-year assessment.Adv Health Sci Educ Theory Pract. 2025 Apr;30(2):427-438. doi: 10.1007/s10459-024-10354-y. Epub 2024 Jul 8. Adv Health Sci Educ Theory Pract. 2025. PMID: 38977526 Free PMC article.
-
Fitness to practise sanctions in UK doctors are predicted by poor performance at MRCGP and MRCP(UK) assessments: data linkage study.BMC Med. 2018 Dec 7;16(1):230. doi: 10.1186/s12916-018-1214-4. BMC Med. 2018. PMID: 30522486 Free PMC article.
References
-
- Postgraduate Medical Education and Training Board . Standards for Curricula and Assessment Systems. London: PMETB; 2008.
-
- Cizek GJ, Bunch MB. Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Thousand Oaks, California: Sage; 2007.
-
- American Educational Research Association, American Psychological Association, National Council on Measurement in Education . Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999.
-
- Case SM, Swanson DB. Constructing Written Test Questions for the Basic and Clinical Sciences. Philadelphia: National Board of Medical Examiners; 1996.
-
- Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and use. 4. Oxford: Oxford University Press; 2008.
Pre-publication history
-
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6920/14/204/prepub
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources