Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 14;21(1):3.
doi: 10.1186/s12963-023-00302-0.

Completeness, agreement, and representativeness of ethnicity recording in the United Kingdom's Clinical Practice Research Datalink (CPRD) and linked Hospital Episode Statistics (HES)

Affiliations

Completeness, agreement, and representativeness of ethnicity recording in the United Kingdom's Clinical Practice Research Datalink (CPRD) and linked Hospital Episode Statistics (HES)

Suhail I Shiekh et al. Popul Health Metr. .

Abstract

Background: This descriptive study assessed the completeness, agreement, and representativeness of ethnicity recording in the United Kingdom (UK) Clinical Practice Research Datalink (CPRD) primary care databases alone and, for those patients registered with a GP in England, when linked to secondary care data from Hospital Episode Statistics (HES).

Methods: Ethnicity records were assessed for all patients in the May 2021 builds of the CPRD GOLD and CPRD Aurum databases for all UK patients. In analyses of the UK, English data was from combined CPRD-HES, whereas data from Northern Ireland, Scotland, and Wales drew from CPRD only. The agreement of ethnicity records per patient was assessed within each dataset (CPRD GOLD, CPRD Aurum, and HES datasets) and between datasets at the highest level ethnicity categorisation ('Asian', 'black', 'mixed', 'white', 'other'). Representativeness was assessed by comparing the ethnic distributions at the highest-level categorisation of CPRD-HES to those from the Census 2011 across the UK's devolved administrations. Additionally, CPRD-HES was compared to the experimental ethnic distributions for England and Wales from the Office for National Statistics in 2019 (ONS2019) and the English ethnic distribution from May 2021 from NHS Digital's General Practice Extraction Service Data for Pandemic Planning and Research with HES data linkage (GDPPR-HES).

Results: In CPRD-HES, 81.7% of currently registered patients in the UK had ethnicity recorded in primary care. For patients with multiple ethnicity records, mismatched ethnicity within individual primary and secondary care datasets was < 10%. Of English patients with ethnicity recorded in both CPRD and HES, 93.3% of records matched at the highest-level categorisation; however, the level of agreement was markedly lower in the 'mixed' and 'other' ethnic groups. CPRD-HES was less proportionately 'white' compared to the UK Census 2011 (80.3% vs. 87.2%) and experimental ONS2019 data (80.4% vs. 84.3%). CPRD-HES was aligned with the ethnic distribution from GDPPR-HES ('white' 80.4% vs. 80.7%); however, with a smaller proportion classified as 'other' (1.1% vs. 2.8%).

Conclusions: CPRD-HES has suitable representation of all ethnic categories with some overrepresentation of minority ethnic groups and a smaller proportion classified as 'other' compared to the UK general population from other data sources. CPRD-HES data is useful for studying health risks and outcomes in typically underrepresented groups.

Keywords: Clinical Practice Research Datalink; Data diversity; Electronic healthcare records; Ethnicity; Hospital episode statistics; Representation.

PubMed Disclaimer

Conflict of interest statement

SIS, REG, PM, HPB, and ELA declare that this work was conducted during their current employment at the CPRD. MH declares that this work was conducted during their previous employment at the CPRD and that, outside of this work, they hold or have held in the last 36 months a doctoral studies stipend from the Medical Research Council to conduct research at the London School of Hygiene and Tropical Medicine. MA has nothing to declare.

Figures

Fig. 1
Fig. 1
a–c Proportion of CPRD and HES populations with at least one ethnicity recording. Proportions (%) of all acceptable and currently registered acceptable patients with at least one ethnicity record, including and excluding unknown ethnicity codes; additionally for primary care-only data, the proportions of all acceptable patients registered at their GP prior to QOF ethnicity recording incentivisation (pre-1 April 2006), during QOF incentivisation (1 April 2006–31 March 2011), and after QOF incentivisation (from 1 April 2011) for a the UK population in CPRD GOLD, b the English population in CPRD Aurum, and c the English population using CPRD-HES
Fig. 2
Fig. 2
a–f Proportion of CPRD and HES populations with matching ethnicity recordings. Proportions (%) of acceptable patients with multiple ethnicity recordings within a dataset where those recordings were truly matched (all middle-level classifications were the same per patient), categorically matched (all higher-level ethnicity classifications were the same but one or more of the middle-level ethnicity classifications were mismatched per patient), or truly mismatched (one or more of the higher-level ethnicity classifications were mismatched per patient) in a CPRD GOLD, b CPRD Aurum, c HES A&E, d HES APC, e HES DID, and f HES OP
Fig. 3
Fig. 3
a–d Ethnic distribution of the UK population in CPRD, HES, and UK Censuses 2011. Proportions (%) of the currently registered acceptable UK populations of a CPRD GOLD-HES, b CPRD Aurum-HES, and c CPRD-HES in each higher-level ethic category as determined using the algorithm with all available data from CPRD and HES compared to the proportions of the d the general population of the UK in Census 2011 in each higher-level ethnic category obtained from the combined figures from 2011 Census in England and Wales, Northern Ireland, and Scotland
Fig. 4
Fig. 4
Ethnic distribution of the English populations in CPRD, HES, and English Census 2011. Proportions (%) of the currently registered acceptable English populations of a CPRD GOLD-HES, b CPRD Aurum-HES, and c CPRD-HES in each higher-level ethic category as determined using the algorithm with all available data from CPRD and HES compared to the proportions of the d the general population of the England in each higher-level ethnic category according to the English Census 2011, e the general population of England in each higher-level ethnic category according experimental ethnicity distributions for England from ONS in 2019, and f the general population of England in each higher-level ethnic category according to NHS Digital’s General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR) with Hospital Episode Statistics (HES) in May 2021

References

    1. Clinical Practice Research Datalink. Clinical Practice Research Datalink [Internet]. 2022 [cited 2022 Mar 4]. Available from: https://cprd.com/home
    1. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Int J Epidemiol. 2015;44(3):827–836. doi: 10.1093/ije/dyv098. - DOI - PMC - PubMed
    1. Wolf A, Dedman D, Campbell J, Booth H, Lunn D, Chapman J, et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int J Epidemiol. 2019;48(6):1740–1740G. doi: 10.1093/ije/dyz034. - DOI - PMC - PubMed
    1. Clinical Practice Research Datalink. CPRD GOLD May 2021 (Version 2021.05.001) [Internet]. 2021 [cited 2022 Mar 4]. Available from: https://www.cprd.com/cprd-gold-may-2021-dataset
    1. Clinical Practice Research Datalink. CPRD Aurum May 2021 dataset (Version 2021.05.001) [Internet]. 2021 [cited 2022 Mar 4]. Available from: https://www.cprd.com/cprd-aurum-may-2021-dataset

Publication types

LinkOut - more resources