Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct;29(1):e100633.
doi: 10.1136/bmjhci-2022-100633.

Data consistency in the English Hospital Episodes Statistics database

Affiliations

Data consistency in the English Hospital Episodes Statistics database

Flavien Hardy et al. BMJ Health Care Inform. 2022 Oct.

Abstract

Background: To gain maximum insight from large administrative healthcare datasets it is important to understand their data quality. Although a gold standard against which to assess criterion validity rarely exists for such datasets, internal consistency can be evaluated. We aimed to identify inconsistencies in the recording of mandatory International Statistical Classification of Diseases and Related Health Problems, tenth revision (ICD-10) codes within the Hospital Episodes Statistics dataset in England.

Methods: Three exemplar medical conditions where recording is mandatory once diagnosed were chosen: autism, type II diabetes mellitus and Parkinson's disease dementia. We identified the first occurrence of the condition ICD-10 code for a patient during the period April 2013 to March 2021 and in subsequent hospital spells. We designed and trained random forest classifiers to identify variables strongly associated with recording inconsistencies.

Results: For autism, diabetes and Parkinson's disease dementia respectively, 43.7%, 8.6% and 31.2% of subsequent spells had inconsistencies. Coding inconsistencies were highly correlated with non-coding of an underlying condition, a change in hospital trust and greater time between the spell with the first coded diagnosis and the subsequent spell. For patients with diabetes or Parkinson's disease dementia, the code recording for spells without an overnight stay were found to have a higher rate of inconsistencies.

Conclusions: Data inconsistencies are relatively common for the three conditions considered. Where these mandatory diagnoses are not recorded in administrative datasets, and where clinical decisions are made based on such data, there is potential for this to impact patient care.

Keywords: information technology.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Proportion of subsequent spells with inconsistencies over time up to three years after the index spell
Figure 2
Figure 2
Percentage of spells with missing mandatory codes within 3 years of the first diagnosis, for the discharge date of the first spell ranging from Q2-2013 to Q1-2018.
Figure 3
Figure 3
Relative permutation importance of predictors contributing to the identification of coding inconsistencies at the spell level for diagnoses of autism (top), diabetes mellitus with peripheral complications (middle) and Parkinson's disease dementia (bottom). Note: The length of each bar indicates how strongly the classifiers rely on each variable to predict coding consistency at the spell level in the test sets; it is a measure of the relative importance of each predictor. The colour bars indicate the values of the Kendall tau-b correlation coefficient between the values of each variable and the estimated Shapley values. Coefficients close to 1 or -1 correspond to strong positive or negative correlations with coding inconsistencies respectively. HFRS, Hospital Frailty Risk Score; ICD-10, International Statistical Classification of Diseases and Related Health Problems, 10th revision; IMD, Index of Multiple Deprivation.

Similar articles

Cited by

References

    1. Agrawal R, Prabakaran S. Big data in digital healthcare: lessons learnt and recommendations for general practice. Heredity 2020;124:525–34. 10.1038/s41437-020-0303-2 - DOI - PMC - PubMed
    1. Griffith GJ, Morris TT, Tudball MJ, et al. . Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun 2020;11:1–12. 10.1038/s41467-020-19478-2 - DOI - PMC - PubMed
    1. Stulberg JJ, Haut ER. Practical guide to surgical data sets: healthcare cost and utilization project national inpatient sample (NIS). JAMA Surg 2018;153:586–7. 10.1001/jamasurg.2018.0542 - DOI - PubMed
    1. Benchimol EI, Smeeth L, Guttmann A, et al. . The reporting of studies conducted using observational Routinely-collected health data (record) statement. PLoS Med 2015;12:e1001885. 10.1371/journal.pmed.1001885 - DOI - PMC - PubMed
    1. Oswald M. Anonymisation standard for publishing health and social care data specification (process standard. Leeds, UK: Information Standards Board for Health and Social Care, 2013.