Data consistency in the English Hospital Episodes Statistics database
- PMID: 36307148
- PMCID: PMC9621173
- DOI: 10.1136/bmjhci-2022-100633
Data consistency in the English Hospital Episodes Statistics database
Abstract
Background: To gain maximum insight from large administrative healthcare datasets it is important to understand their data quality. Although a gold standard against which to assess criterion validity rarely exists for such datasets, internal consistency can be evaluated. We aimed to identify inconsistencies in the recording of mandatory International Statistical Classification of Diseases and Related Health Problems, tenth revision (ICD-10) codes within the Hospital Episodes Statistics dataset in England.
Methods: Three exemplar medical conditions where recording is mandatory once diagnosed were chosen: autism, type II diabetes mellitus and Parkinson's disease dementia. We identified the first occurrence of the condition ICD-10 code for a patient during the period April 2013 to March 2021 and in subsequent hospital spells. We designed and trained random forest classifiers to identify variables strongly associated with recording inconsistencies.
Results: For autism, diabetes and Parkinson's disease dementia respectively, 43.7%, 8.6% and 31.2% of subsequent spells had inconsistencies. Coding inconsistencies were highly correlated with non-coding of an underlying condition, a change in hospital trust and greater time between the spell with the first coded diagnosis and the subsequent spell. For patients with diabetes or Parkinson's disease dementia, the code recording for spells without an overnight stay were found to have a higher rate of inconsistencies.
Conclusions: Data inconsistencies are relatively common for the three conditions considered. Where these mandatory diagnoses are not recorded in administrative datasets, and where clinical decisions are made based on such data, there is potential for this to impact patient care.
Keywords: information technology.
© Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY. Published by BMJ.
Conflict of interest statement
Competing interests: None declared.
Figures



Similar articles
-
Data quality and autism: Issues and potential impacts.Int J Med Inform. 2023 Feb;170:104938. doi: 10.1016/j.ijmedinf.2022.104938. Epub 2022 Nov 28. Int J Med Inform. 2023. PMID: 36455477
-
Accuracy of ICD-10 Coding System for Identifying Comorbidities and Infectious Conditions Using Data from a Thai University Hospital Administrative Database.J Med Assoc Thai. 2016 Apr;99(4):368-73. J Med Assoc Thai. 2016. PMID: 27396219
-
Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware.BMC Med Res Methodol. 2017 Jun 8;17(1):89. doi: 10.1186/s12874-017-0361-1. BMC Med Res Methodol. 2017. PMID: 28595574 Free PMC article.
-
A Systematic Review of Case-Identification Algorithms Based on Italian Healthcare Administrative Databases for Two Relevant Diseases of the Endocrine System: Diabetes Mellitus and Thyroid Disorders.Epidemiol Prev. 2019 Jul-Aug;43(4 Suppl 2):17-36. doi: 10.19191/EP19.4.S2.P008.089. Epidemiol Prev. 2019. PMID: 31650804
-
Optimizing coding and reimbursement to improve management of Alzheimer's disease and related dementias.J Am Geriatr Soc. 2002 Nov;50(11):1871-8. doi: 10.1046/j.1532-5415.2002.50519.x. J Am Geriatr Soc. 2002. PMID: 12410910 Review.
Cited by
-
Day-case and in-patient elective inguinal hernia repair surgery across England: an observational study of variation and outcomes.Hernia. 2023 Dec;27(6):1439-1449. doi: 10.1007/s10029-023-02893-x. Epub 2023 Oct 18. Hernia. 2023. PMID: 37851291
-
Hospital length of stay, 30-day emergency readmissions and the role of the DrEaMing enhanced recovery pathways in colonic and rectal surgery in England.Br J Anaesth. 2025 Jun;134(6):1765-1772. doi: 10.1016/j.bja.2025.02.034. Epub 2025 Apr 22. Br J Anaesth. 2025. PMID: 40268639
-
Trends Over Time in the Use, Carbon Footprint and Costs of Facet Joint Injections and Medial Branch Blocks to Manage Lumbar Pain in England: Retrospective Analysis of an Administrative Dataset.Global Spine J. 2025 Mar;15(2):648-655. doi: 10.1177/21925682231203651. Epub 2023 Oct 4. Global Spine J. 2025. PMID: 37791603 Free PMC article.
-
Factors associated with poorer outcomes for posterior lumbar decompression and or/or discectomy: an exploratory analysis of administrative data.Arch Orthop Trauma Surg. 2024 Mar;144(3):1129-1137. doi: 10.1007/s00402-023-05182-5. Epub 2024 Jan 11. Arch Orthop Trauma Surg. 2024. PMID: 38206447
-
Carbon emissions from clinical activities by speciality in secondary and tertiary care in England: an exploratory cross-sectional analysis of routine administrative data.Lancet Reg Health Eur. 2025 Jun 2;54:101333. doi: 10.1016/j.lanepe.2025.101333. eCollection 2025 Jul. Lancet Reg Health Eur. 2025. PMID: 40519770 Free PMC article.
References
-
- Oswald M. Anonymisation standard for publishing health and social care data specification (process standard. Leeds, UK: Information Standards Board for Health and Social Care, 2013.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous