Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr;36(2):397-405.
doi: 10.1007/s10877-021-00664-6. Epub 2021 Feb 8.

Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

Affiliations

Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

Tiffany Pellathy et al. J Clin Monit Comput. 2022 Apr.

Abstract

Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.

Keywords: Administrative coding; Big data analytics; Electronic health record data; Machine learning; Phenotyping; Venous thromboembolism.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors have no commercial conflicts of interest with this work.

Figures

Fig. 1
Fig. 1
New-onset, hospital acquired venous thromboembolism (HA-VTE) case identification process

Similar articles

Cited by

References

    1. Manyika J Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/.... 2011.
    1. Bowton E, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med. 2014;6(234):234cm3. - PMC - PubMed
    1. Pinsky MR, Dubrawski A. Gleaning knowledge from data in the intensive care unit. Am J Respir Crit Care Med. 2014;190(6):606–10. - PMC - PubMed
    1. Shmueli G To explain or to predict? Stat Sci. 2010;25(3):289–310.
    1. Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn. 2018;18(3):219–26. - PMC - PubMed

Publication types

LinkOut - more resources