Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research
- PMID: 33558981
- PMCID: PMC8349368
- DOI: 10.1007/s10877-021-00664-6
Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research
Abstract
Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.
Keywords: Administrative coding; Big data analytics; Electronic health record data; Machine learning; Phenotyping; Venous thromboembolism.
© 2021. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.
Conflict of interest statement
Figures
Similar articles
-
Development and evaluation of a national administrative code-based system for estimation of hospital-acquired venous thromboembolism in Ireland.BMJ Open. 2025 Feb 20;15(2):e084951. doi: 10.1136/bmjopen-2024-084951. BMJ Open. 2025. PMID: 39979043 Free PMC article.
-
Administrative codes inaccurately identify recurrent venous thromboembolism: The CVRN VTE study.Thromb Res. 2020 May;189:112-118. doi: 10.1016/j.thromres.2020.02.023. Epub 2020 Mar 5. Thromb Res. 2020. PMID: 32199174 Free PMC article.
-
Validity of Using Inpatient and Outpatient Administrative Codes to Identify Acute Venous Thromboembolism: The CVRN VTE Study.Med Care. 2017 Dec;55(12):e137-e143. doi: 10.1097/MLR.0000000000000524. Med Care. 2017. PMID: 29135777 Free PMC article.
-
[Clinical features of patients with venous thromboembolism: 177 case analysis in 10 years].Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2019 Apr;31(4):453-457. doi: 10.3760/cma.j.issn.2095-4352.2019.04.016. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2019. PMID: 31109420 Review. Chinese.
-
Machine Learning-Based Predictive Models for Patients with Venous Thromboembolism: A Systematic Review.Thromb Haemost. 2024 Nov;124(11):1040-1052. doi: 10.1055/a-2299-4758. Epub 2024 Apr 4. Thromb Haemost. 2024. PMID: 38574756
Cited by
-
Artificial intelligence for venous thromboembolism prophylaxis: Clinician perspectives.Res Pract Thromb Haemost. 2023 Nov 23;7(8):102272. doi: 10.1016/j.rpth.2023.102272. eCollection 2023 Nov. Res Pract Thromb Haemost. 2023. PMID: 38169996 Free PMC article. No abstract available.
-
Developing and optimizing a computable phenotype for incident venous thromboembolism in a longitudinal cohort of patients with cancer.Res Pract Thromb Haemost. 2022 May 25;6(4):e12733. doi: 10.1002/rth2.12733. eCollection 2022 May. Res Pract Thromb Haemost. 2022. PMID: 35647478 Free PMC article.
-
Diabetes status and other factors as correlates of risk for thrombotic and thromboembolic events during SARS-CoV-2 infection: A nationwide retrospective case-control study using Cerner Real-World Data™.BMJ Open. 2023 Jul 9;13(7):e071475. doi: 10.1136/bmjopen-2022-071475. BMJ Open. 2023. PMID: 37423628 Free PMC article.
-
Accuracy of efficient data methods to determine the incidence of hospital-acquired thrombosis and major bleeding in medical and surgical inpatients: a multicentre observational cohort study in four UK hospitals.BMJ Open. 2023 Feb 6;13(2):e069244. doi: 10.1136/bmjopen-2022-069244. BMJ Open. 2023. PMID: 36746545 Free PMC article.
-
Real-world Health Data and Precision for the Diagnosis of Acute Kidney Injury, Acute-on-Chronic Kidney Disease, and Chronic Kidney Disease: Observational Study.JMIR Med Inform. 2022 Jan 25;10(1):e31356. doi: 10.2196/31356. JMIR Med Inform. 2022. PMID: 35076410 Free PMC article.
References
-
- Manyika J Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/.... 2011.
-
- Shmueli G To explain or to predict? Stat Sci. 2010;25(3):289–310.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources